ext4 won't mount - fsck required - 2nd fsck in less than a week

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ext4 won't mount - fsck required - 2nd fsck in less than a week
@ 2012-09-10  2:34 Terry
  2012-09-10  2:47 ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-10  2:34 UTC (permalink / raw)
  To: linux-ext4

Hello,

As the subject says, we have a 15 TB fsck drive that won't mount with
these errors:

Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
Inode bitmap for group 3200 not in group (block 4161027887)!
Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!

We did a proactive fsck on Tuesday of last week because it was
starting to give filesystem errors. It ran through and mounted fine.

The filesystem lives on an equallogic SAN spread across 36 drives.
Could this be something with the physical layer or is it not abnormal
to have to run multiple rounds of fsck to fully fix an issue?

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10  2:34 ext4 won't mount - fsck required - 2nd fsck in less than a week Terry
@ 2012-09-10  2:47 ` Theodore Ts'o
  2012-09-10  2:53   ` Terry
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2012-09-10  2:47 UTC (permalink / raw)
  To: Terry; +Cc: linux-ext4

On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
> 
> As the subject says, we have a 15 TB fsck drive that won't mount with
> these errors:
> 
> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
> Inode bitmap for group 3200 not in group (block 4161027887)!
> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!

These indicate a very basic file system corruption where the block
group descriptors are corrupted.  E2fsck will complain immediately
upon seeing this sort of fs inconsistency, and the first thing it will
try to do is fix it.

> We did a proactive fsck on Tuesday of last week because it was
> starting to give filesystem errors. It ran through and mounted fine.
> 
> The filesystem lives on an equallogic SAN spread across 36 drives.
> Could this be something with the physical layer or is it not abnormal
> to have to run multiple rounds of fsck to fully fix an issue?

This is most probably a hardware problem; normally e2fsck will fix
file system corruptions (and certainly problems such as corrupt block
group scriptors) in a single pass.  If e2fsck finished and the file
system mounted fine last week, and now you're getting this kind of
error, it basically screams some kind of physical layer problem, or
perhaps a bad hard drive, or perhaps the SAN disk is getting
incorrectly written to by some other system, etc.

	    	       	       	     - Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10  2:47 ` Theodore Ts'o
@ 2012-09-10  2:53   ` Terry
  2012-09-10  3:18     ` Terry
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-10  2:53 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
>>
>> As the subject says, we have a 15 TB fsck drive that won't mount with
>> these errors:
>>
>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
>> Inode bitmap for group 3200 not in group (block 4161027887)!
>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
>
> These indicate a very basic file system corruption where the block
> group descriptors are corrupted.  E2fsck will complain immediately
> upon seeing this sort of fs inconsistency, and the first thing it will
> try to do is fix it.
>
>> We did a proactive fsck on Tuesday of last week because it was
>> starting to give filesystem errors. It ran through and mounted fine.
>>
>> The filesystem lives on an equallogic SAN spread across 36 drives.
>> Could this be something with the physical layer or is it not abnormal
>> to have to run multiple rounds of fsck to fully fix an issue?
>
> This is most probably a hardware problem; normally e2fsck will fix
> file system corruptions (and certainly problems such as corrupt block
> group scriptors) in a single pass.  If e2fsck finished and the file
> system mounted fine last week, and now you're getting this kind of
> error, it basically screams some kind of physical layer problem, or
> perhaps a bad hard drive, or perhaps the SAN disk is getting
> incorrectly written to by some other system, etc.
>
>                                      - Ted

Thanks for the reply.  It is part of a RHEL cluster but we did not
have any situations where multiple systems mounted the filesystem.  It
is a an old SAN so perhaps we have a physical issue. We'll see what it
happens with this pass.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10  2:53   ` Terry
@ 2012-09-10  3:18     ` Terry
  2012-09-10 13:48       ` Terry
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-10  3:18 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote:
> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
>>>
>>> As the subject says, we have a 15 TB fsck drive that won't mount with
>>> these errors:
>>>
>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
>>> Inode bitmap for group 3200 not in group (block 4161027887)!
>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
>>
>> These indicate a very basic file system corruption where the block
>> group descriptors are corrupted.  E2fsck will complain immediately
>> upon seeing this sort of fs inconsistency, and the first thing it will
>> try to do is fix it.
>>
>>> We did a proactive fsck on Tuesday of last week because it was
>>> starting to give filesystem errors. It ran through and mounted fine.
>>>
>>> The filesystem lives on an equallogic SAN spread across 36 drives.
>>> Could this be something with the physical layer or is it not abnormal
>>> to have to run multiple rounds of fsck to fully fix an issue?
>>
>> This is most probably a hardware problem; normally e2fsck will fix
>> file system corruptions (and certainly problems such as corrupt block
>> group scriptors) in a single pass.  If e2fsck finished and the file
>> system mounted fine last week, and now you're getting this kind of
>> error, it basically screams some kind of physical layer problem, or
>> perhaps a bad hard drive, or perhaps the SAN disk is getting
>> incorrectly written to by some other system, etc.
>>
>>                                      - Ted
>
> Thanks for the reply.  It is part of a RHEL cluster but we did not
> have any situations where multiple systems mounted the filesystem.  It
> is a an old SAN so perhaps we have a physical issue. We'll see what it
> happens with this pass.

While I am waiting for fsck to finish, another thought. This
filesystem contains a lot of small files. 35,867,642 files to be
exact.  Anything else I should check or know to ensure a smooth
operation for these types of filesystems?  I formatted them with
standard RHEL 6 options.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10  3:18     ` Terry
@ 2012-09-10 13:48       ` Terry
  2012-09-10 13:56         ` Terry
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-10 13:48 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote:
> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote:
>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
>>>>
>>>> As the subject says, we have a 15 TB fsck drive that won't mount with
>>>> these errors:
>>>>
>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
>>>> Inode bitmap for group 3200 not in group (block 4161027887)!
>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
>>>
>>> These indicate a very basic file system corruption where the block
>>> group descriptors are corrupted.  E2fsck will complain immediately
>>> upon seeing this sort of fs inconsistency, and the first thing it will
>>> try to do is fix it.
>>>
>>>> We did a proactive fsck on Tuesday of last week because it was
>>>> starting to give filesystem errors. It ran through and mounted fine.
>>>>
>>>> The filesystem lives on an equallogic SAN spread across 36 drives.
>>>> Could this be something with the physical layer or is it not abnormal
>>>> to have to run multiple rounds of fsck to fully fix an issue?
>>>
>>> This is most probably a hardware problem; normally e2fsck will fix
>>> file system corruptions (and certainly problems such as corrupt block
>>> group scriptors) in a single pass.  If e2fsck finished and the file
>>> system mounted fine last week, and now you're getting this kind of
>>> error, it basically screams some kind of physical layer problem, or
>>> perhaps a bad hard drive, or perhaps the SAN disk is getting
>>> incorrectly written to by some other system, etc.
>>>
>>>                                      - Ted
>>
>> Thanks for the reply.  It is part of a RHEL cluster but we did not
>> have any situations where multiple systems mounted the filesystem.  It
>> is a an old SAN so perhaps we have a physical issue. We'll see what it
>> happens with this pass.
>
> While I am waiting for fsck to finish, another thought. This
> filesystem contains a lot of small files. 35,867,642 files to be
> exact.  Anything else I should check or know to ensure a smooth
> operation for these types of filesystems?  I formatted them with
> standard RHEL 6 options.

FSCK completed fixing a lot of things.  The file system then mounted
without any errors.  We are still getting these types of errors in
/var/log/messages:

Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6):
ext4_dx_find_entry: bad entry in directory #743966900: directory entry
across blocks - block=2975876794offset=0(946176), inode=1414751737,
rec_len=45724, name_len=206

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10 13:48       ` Terry
@ 2012-09-10 13:56         ` Terry
  2012-09-11 16:22           ` Terry
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-10 13:56 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote:
> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote:
>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote:
>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
>>>>>
>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with
>>>>> these errors:
>>>>>
>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
>>>>> Inode bitmap for group 3200 not in group (block 4161027887)!
>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
>>>>
>>>> These indicate a very basic file system corruption where the block
>>>> group descriptors are corrupted.  E2fsck will complain immediately
>>>> upon seeing this sort of fs inconsistency, and the first thing it will
>>>> try to do is fix it.
>>>>
>>>>> We did a proactive fsck on Tuesday of last week because it was
>>>>> starting to give filesystem errors. It ran through and mounted fine.
>>>>>
>>>>> The filesystem lives on an equallogic SAN spread across 36 drives.
>>>>> Could this be something with the physical layer or is it not abnormal
>>>>> to have to run multiple rounds of fsck to fully fix an issue?
>>>>
>>>> This is most probably a hardware problem; normally e2fsck will fix
>>>> file system corruptions (and certainly problems such as corrupt block
>>>> group scriptors) in a single pass.  If e2fsck finished and the file
>>>> system mounted fine last week, and now you're getting this kind of
>>>> error, it basically screams some kind of physical layer problem, or
>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting
>>>> incorrectly written to by some other system, etc.
>>>>
>>>>                                      - Ted
>>>
>>> Thanks for the reply.  It is part of a RHEL cluster but we did not
>>> have any situations where multiple systems mounted the filesystem.  It
>>> is a an old SAN so perhaps we have a physical issue. We'll see what it
>>> happens with this pass.
>>
>> While I am waiting for fsck to finish, another thought. This
>> filesystem contains a lot of small files. 35,867,642 files to be
>> exact.  Anything else I should check or know to ensure a smooth
>> operation for these types of filesystems?  I formatted them with
>> standard RHEL 6 options.
>
> FSCK completed fixing a lot of things.  The file system then mounted
> without any errors.  We are still getting these types of errors in
> /var/log/messages:
>
> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6):
> ext4_dx_find_entry: bad entry in directory #743966900: directory entry
> across blocks - block=2975876794offset=0(946176), inode=1414751737,
> rec_len=45724, name_len=206
>
> Thoughts?

Hold that thought.  This is another filesystem.  Let me fix that one
then come back to this problem if it still exists.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-10 13:56         ` Terry
@ 2012-09-11 16:22           ` Terry
  2012-09-11 17:00             ` Theodore Ts'o
  2012-09-11 17:59             ` Lukáš Czerner
  0 siblings, 2 replies; 12+ messages in thread
From: Terry @ 2012-09-11 16:22 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Mon, Sep 10, 2012 at 8:56 AM, Terry <td3201@gmail.com> wrote:
> On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote:
>> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote:
>>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote:
>>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
>>>>>>
>>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with
>>>>>> these errors:
>>>>>>
>>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
>>>>>> Inode bitmap for group 3200 not in group (block 4161027887)!
>>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
>>>>>
>>>>> These indicate a very basic file system corruption where the block
>>>>> group descriptors are corrupted.  E2fsck will complain immediately
>>>>> upon seeing this sort of fs inconsistency, and the first thing it will
>>>>> try to do is fix it.
>>>>>
>>>>>> We did a proactive fsck on Tuesday of last week because it was
>>>>>> starting to give filesystem errors. It ran through and mounted fine.
>>>>>>
>>>>>> The filesystem lives on an equallogic SAN spread across 36 drives.
>>>>>> Could this be something with the physical layer or is it not abnormal
>>>>>> to have to run multiple rounds of fsck to fully fix an issue?
>>>>>
>>>>> This is most probably a hardware problem; normally e2fsck will fix
>>>>> file system corruptions (and certainly problems such as corrupt block
>>>>> group scriptors) in a single pass.  If e2fsck finished and the file
>>>>> system mounted fine last week, and now you're getting this kind of
>>>>> error, it basically screams some kind of physical layer problem, or
>>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting
>>>>> incorrectly written to by some other system, etc.
>>>>>
>>>>>                                      - Ted
>>>>
>>>> Thanks for the reply.  It is part of a RHEL cluster but we did not
>>>> have any situations where multiple systems mounted the filesystem.  It
>>>> is a an old SAN so perhaps we have a physical issue. We'll see what it
>>>> happens with this pass.
>>>
>>> While I am waiting for fsck to finish, another thought. This
>>> filesystem contains a lot of small files. 35,867,642 files to be
>>> exact.  Anything else I should check or know to ensure a smooth
>>> operation for these types of filesystems?  I formatted them with
>>> standard RHEL 6 options.
>>
>> FSCK completed fixing a lot of things.  The file system then mounted
>> without any errors.  We are still getting these types of errors in
>> /var/log/messages:
>>
>> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6):
>> ext4_dx_find_entry: bad entry in directory #743966900: directory entry
>> across blocks - block=2975876794offset=0(946176), inode=1414751737,
>> rec_len=45724, name_len=206
>>
>> Thoughts?
>
> Hold that thought.  This is another filesystem.  Let me fix that one
> then come back to this problem if it still exists.

Ok, fixed the other filesystem (dm-6) yesterday.  Today, getting these
errors still on it:
Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048
in gd
Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665
in gd
Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap,
8302 in gd
Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap,
23711 in gd
Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap,
26624 in gd
Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap,
1256 in gd
Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap,
384 in gd
Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap,
10453 in gd
Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap,
5641 in gd
Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6):
ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap,
29946 in gd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-11 16:22           ` Terry
@ 2012-09-11 17:00             ` Theodore Ts'o
  2012-09-11 17:07               ` Terry
  2012-09-11 17:59             ` Lukáš Czerner
  1 sibling, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2012-09-11 17:00 UTC (permalink / raw)
  To: Terry; +Cc: linux-ext4

You haven't said what kernel you are using, but this really does look
like storage device issue.  You mentioned this was a RHEL cluster?
Have you considered opening a support ticket with Red Hat?

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-11 17:00             ` Theodore Ts'o
@ 2012-09-11 17:07               ` Terry
  2012-09-11 18:06                 ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Terry @ 2012-09-11 17:07 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Tue, Sep 11, 2012 at 12:00 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> You haven't said what kernel you are using, but this really does look
> like storage device issue.  You mentioned this was a RHEL cluster?
> Have you considered opening a support ticket with Red Hat?
>
> Regards,
>
>                                         - Ted

I just did that. RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-11 17:07               ` Terry
@ 2012-09-11 18:06                 ` Theodore Ts'o
  2012-09-11 18:16                   ` Eric Sandeen
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2012-09-11 18:06 UTC (permalink / raw)
  To: Terry; +Cc: linux-ext4

On Tue, Sep 11, 2012 at 12:07:14PM -0500, Terry wrote:
> 
> RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64

I'll let Eric or Lukas comment, but as far as I know the ext4 in the
RHEL 6 kernels has been quite stable (there are a lot of bug fixes
that have been backported to the RHEL 6 kernel, and while it doesn't
have some of the newer ext4 features, it doesn't have any of the more
exciting bugs that might come with the newer features :-).

So I really would strongly suspect the SAN or the SAN-attached storage
as being flaky, causing the file system corruptions which is leading
to the kernel and e2fsck complaining.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-11 18:06                 ` Theodore Ts'o
@ 2012-09-11 18:16                   ` Eric Sandeen
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2012-09-11 18:16 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Terry, linux-ext4

On 9/11/12 1:06 PM, Theodore Ts'o wrote:
> On Tue, Sep 11, 2012 at 12:07:14PM -0500, Terry wrote:
>>
>> RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64
> 
> I'll let Eric or Lukas comment, but as far as I know the ext4 in the
> RHEL 6 kernels has been quite stable (there are a lot of bug fixes
> that have been backported to the RHEL 6 kernel, and while it doesn't
> have some of the newer ext4 features, it doesn't have any of the more
> exciting bugs that might come with the newer features :-).
> 
> So I really would strongly suspect the SAN or the SAN-attached storage
> as being flaky, causing the file system corruptions which is leading
> to the kernel and e2fsck complaining.

Right, ext4 is default in RHEL6 and well tended.  We'll let support look
at the case, and see what's going on.  But I'd certainly troubleshoot
the san as well, esp. to be sure HA is set up properly and not mounting
it twice on 2 nodes etc.

-Eric


> Regards,
> 
> 						- Ted
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
  2012-09-11 16:22           ` Terry
  2012-09-11 17:00             ` Theodore Ts'o
@ 2012-09-11 17:59             ` Lukáš Czerner
  1 sibling, 0 replies; 12+ messages in thread
From: Lukáš Czerner @ 2012-09-11 17:59 UTC (permalink / raw)
  To: Terry; +Cc: Theodore Ts'o, linux-ext4

On Tue, 11 Sep 2012, Terry wrote:

> Date: Tue, 11 Sep 2012 11:22:27 -0500
> From: Terry <td3201@gmail.com>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
> 
> On Mon, Sep 10, 2012 at 8:56 AM, Terry <td3201@gmail.com> wrote:
> > On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote:
> >> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote:
> >>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote:
> >>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> >>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
> >>>>>>
> >>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with
> >>>>>> these errors:
> >>>>>>
> >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
> >>>>>> Inode bitmap for group 3200 not in group (block 4161027887)!
> >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
> >>>>>
> >>>>> These indicate a very basic file system corruption where the block
> >>>>> group descriptors are corrupted.  E2fsck will complain immediately
> >>>>> upon seeing this sort of fs inconsistency, and the first thing it will
> >>>>> try to do is fix it.
> >>>>>
> >>>>>> We did a proactive fsck on Tuesday of last week because it was
> >>>>>> starting to give filesystem errors. It ran through and mounted fine.
> >>>>>>
> >>>>>> The filesystem lives on an equallogic SAN spread across 36 drives.
> >>>>>> Could this be something with the physical layer or is it not abnormal
> >>>>>> to have to run multiple rounds of fsck to fully fix an issue?
> >>>>>
> >>>>> This is most probably a hardware problem; normally e2fsck will fix
> >>>>> file system corruptions (and certainly problems such as corrupt block
> >>>>> group scriptors) in a single pass.  If e2fsck finished and the file
> >>>>> system mounted fine last week, and now you're getting this kind of
> >>>>> error, it basically screams some kind of physical layer problem, or
> >>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting
> >>>>> incorrectly written to by some other system, etc.
> >>>>>
> >>>>>                                      - Ted
> >>>>
> >>>> Thanks for the reply.  It is part of a RHEL cluster but we did not
> >>>> have any situations where multiple systems mounted the filesystem.  It
> >>>> is a an old SAN so perhaps we have a physical issue. We'll see what it
> >>>> happens with this pass.
> >>>
> >>> While I am waiting for fsck to finish, another thought. This
> >>> filesystem contains a lot of small files. 35,867,642 files to be
> >>> exact.  Anything else I should check or know to ensure a smooth
> >>> operation for these types of filesystems?  I formatted them with
> >>> standard RHEL 6 options.
> >>
> >> FSCK completed fixing a lot of things.  The file system then mounted
> >> without any errors.  We are still getting these types of errors in
> >> /var/log/messages:
> >>
> >> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6):
> >> ext4_dx_find_entry: bad entry in directory #743966900: directory entry
> >> across blocks - block=2975876794offset=0(946176), inode=1414751737,
> >> rec_len=45724, name_len=206
> >>
> >> Thoughts?
> >
> > Hold that thought.  This is another filesystem.  Let me fix that one
> > then come back to this problem if it still exists.
> 
> Ok, fixed the other filesystem (dm-6) yesterday.  Today, getting these
> errors still on it:
> Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048
> in gd
> Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665
> in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap,
> 8302 in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap,
> 23711 in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap,
> 26624 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap,
> 1256 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap,
> 384 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap,
> 10453 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap,
> 5641 in gd
> Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap,
> 29946 in gd

Hi, what RHEL version are you using, or even better what kernel
version are you using ? If you have RHEL subscription, you should
definitely Red Hat about the issue.

Thanks!
-Lukas

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-09-11 18:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-10  2:34 ext4 won't mount - fsck required - 2nd fsck in less than a week Terry
2012-09-10  2:47 ` Theodore Ts'o
2012-09-10  2:53   ` Terry
2012-09-10  3:18     ` Terry
2012-09-10 13:48       ` Terry
2012-09-10 13:56         ` Terry
2012-09-11 16:22           ` Terry
2012-09-11 17:00             ` Theodore Ts'o
2012-09-11 17:07               ` Terry
2012-09-11 18:06                 ` Theodore Ts'o
2012-09-11 18:16                   ` Eric Sandeen
2012-09-11 17:59             ` Lukáš Czerner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).