* ext4 won't mount - fsck required - 2nd fsck in less than a week @ 2012-09-10 2:34 Terry 2012-09-10 2:47 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-10 2:34 UTC (permalink / raw) To: linux-ext4 Hello, As the subject says, we have a 15 TB fsck drive that won't mount with these errors: Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: Inode bitmap for group 3200 not in group (block 4161027887)! Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! We did a proactive fsck on Tuesday of last week because it was starting to give filesystem errors. It ran through and mounted fine. The filesystem lives on an equallogic SAN spread across 36 drives. Could this be something with the physical layer or is it not abnormal to have to run multiple rounds of fsck to fully fix an issue? Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 2:34 ext4 won't mount - fsck required - 2nd fsck in less than a week Terry @ 2012-09-10 2:47 ` Theodore Ts'o 2012-09-10 2:53 ` Terry 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2012-09-10 2:47 UTC (permalink / raw) To: Terry; +Cc: linux-ext4 On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: > > As the subject says, we have a 15 TB fsck drive that won't mount with > these errors: > > Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: > Inode bitmap for group 3200 not in group (block 4161027887)! > Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! These indicate a very basic file system corruption where the block group descriptors are corrupted. E2fsck will complain immediately upon seeing this sort of fs inconsistency, and the first thing it will try to do is fix it. > We did a proactive fsck on Tuesday of last week because it was > starting to give filesystem errors. It ran through and mounted fine. > > The filesystem lives on an equallogic SAN spread across 36 drives. > Could this be something with the physical layer or is it not abnormal > to have to run multiple rounds of fsck to fully fix an issue? This is most probably a hardware problem; normally e2fsck will fix file system corruptions (and certainly problems such as corrupt block group scriptors) in a single pass. If e2fsck finished and the file system mounted fine last week, and now you're getting this kind of error, it basically screams some kind of physical layer problem, or perhaps a bad hard drive, or perhaps the SAN disk is getting incorrectly written to by some other system, etc. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 2:47 ` Theodore Ts'o @ 2012-09-10 2:53 ` Terry 2012-09-10 3:18 ` Terry 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-10 2:53 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: >> >> As the subject says, we have a 15 TB fsck drive that won't mount with >> these errors: >> >> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: >> Inode bitmap for group 3200 not in group (block 4161027887)! >> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! > > These indicate a very basic file system corruption where the block > group descriptors are corrupted. E2fsck will complain immediately > upon seeing this sort of fs inconsistency, and the first thing it will > try to do is fix it. > >> We did a proactive fsck on Tuesday of last week because it was >> starting to give filesystem errors. It ran through and mounted fine. >> >> The filesystem lives on an equallogic SAN spread across 36 drives. >> Could this be something with the physical layer or is it not abnormal >> to have to run multiple rounds of fsck to fully fix an issue? > > This is most probably a hardware problem; normally e2fsck will fix > file system corruptions (and certainly problems such as corrupt block > group scriptors) in a single pass. If e2fsck finished and the file > system mounted fine last week, and now you're getting this kind of > error, it basically screams some kind of physical layer problem, or > perhaps a bad hard drive, or perhaps the SAN disk is getting > incorrectly written to by some other system, etc. > > - Ted Thanks for the reply. It is part of a RHEL cluster but we did not have any situations where multiple systems mounted the filesystem. It is a an old SAN so perhaps we have a physical issue. We'll see what it happens with this pass. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 2:53 ` Terry @ 2012-09-10 3:18 ` Terry 2012-09-10 13:48 ` Terry 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-10 3:18 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote: > On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: >> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: >>> >>> As the subject says, we have a 15 TB fsck drive that won't mount with >>> these errors: >>> >>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: >>> Inode bitmap for group 3200 not in group (block 4161027887)! >>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! >> >> These indicate a very basic file system corruption where the block >> group descriptors are corrupted. E2fsck will complain immediately >> upon seeing this sort of fs inconsistency, and the first thing it will >> try to do is fix it. >> >>> We did a proactive fsck on Tuesday of last week because it was >>> starting to give filesystem errors. It ran through and mounted fine. >>> >>> The filesystem lives on an equallogic SAN spread across 36 drives. >>> Could this be something with the physical layer or is it not abnormal >>> to have to run multiple rounds of fsck to fully fix an issue? >> >> This is most probably a hardware problem; normally e2fsck will fix >> file system corruptions (and certainly problems such as corrupt block >> group scriptors) in a single pass. If e2fsck finished and the file >> system mounted fine last week, and now you're getting this kind of >> error, it basically screams some kind of physical layer problem, or >> perhaps a bad hard drive, or perhaps the SAN disk is getting >> incorrectly written to by some other system, etc. >> >> - Ted > > Thanks for the reply. It is part of a RHEL cluster but we did not > have any situations where multiple systems mounted the filesystem. It > is a an old SAN so perhaps we have a physical issue. We'll see what it > happens with this pass. While I am waiting for fsck to finish, another thought. This filesystem contains a lot of small files. 35,867,642 files to be exact. Anything else I should check or know to ensure a smooth operation for these types of filesystems? I formatted them with standard RHEL 6 options. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 3:18 ` Terry @ 2012-09-10 13:48 ` Terry 2012-09-10 13:56 ` Terry 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-10 13:48 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote: > On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote: >> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: >>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: >>>> >>>> As the subject says, we have a 15 TB fsck drive that won't mount with >>>> these errors: >>>> >>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: >>>> Inode bitmap for group 3200 not in group (block 4161027887)! >>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! >>> >>> These indicate a very basic file system corruption where the block >>> group descriptors are corrupted. E2fsck will complain immediately >>> upon seeing this sort of fs inconsistency, and the first thing it will >>> try to do is fix it. >>> >>>> We did a proactive fsck on Tuesday of last week because it was >>>> starting to give filesystem errors. It ran through and mounted fine. >>>> >>>> The filesystem lives on an equallogic SAN spread across 36 drives. >>>> Could this be something with the physical layer or is it not abnormal >>>> to have to run multiple rounds of fsck to fully fix an issue? >>> >>> This is most probably a hardware problem; normally e2fsck will fix >>> file system corruptions (and certainly problems such as corrupt block >>> group scriptors) in a single pass. If e2fsck finished and the file >>> system mounted fine last week, and now you're getting this kind of >>> error, it basically screams some kind of physical layer problem, or >>> perhaps a bad hard drive, or perhaps the SAN disk is getting >>> incorrectly written to by some other system, etc. >>> >>> - Ted >> >> Thanks for the reply. It is part of a RHEL cluster but we did not >> have any situations where multiple systems mounted the filesystem. It >> is a an old SAN so perhaps we have a physical issue. We'll see what it >> happens with this pass. > > While I am waiting for fsck to finish, another thought. This > filesystem contains a lot of small files. 35,867,642 files to be > exact. Anything else I should check or know to ensure a smooth > operation for these types of filesystems? I formatted them with > standard RHEL 6 options. FSCK completed fixing a lot of things. The file system then mounted without any errors. We are still getting these types of errors in /var/log/messages: Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6): ext4_dx_find_entry: bad entry in directory #743966900: directory entry across blocks - block=2975876794offset=0(946176), inode=1414751737, rec_len=45724, name_len=206 Thoughts? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 13:48 ` Terry @ 2012-09-10 13:56 ` Terry 2012-09-11 16:22 ` Terry 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-10 13:56 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote: > On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote: >> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote: >>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: >>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: >>>>> >>>>> As the subject says, we have a 15 TB fsck drive that won't mount with >>>>> these errors: >>>>> >>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: >>>>> Inode bitmap for group 3200 not in group (block 4161027887)! >>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! >>>> >>>> These indicate a very basic file system corruption where the block >>>> group descriptors are corrupted. E2fsck will complain immediately >>>> upon seeing this sort of fs inconsistency, and the first thing it will >>>> try to do is fix it. >>>> >>>>> We did a proactive fsck on Tuesday of last week because it was >>>>> starting to give filesystem errors. It ran through and mounted fine. >>>>> >>>>> The filesystem lives on an equallogic SAN spread across 36 drives. >>>>> Could this be something with the physical layer or is it not abnormal >>>>> to have to run multiple rounds of fsck to fully fix an issue? >>>> >>>> This is most probably a hardware problem; normally e2fsck will fix >>>> file system corruptions (and certainly problems such as corrupt block >>>> group scriptors) in a single pass. If e2fsck finished and the file >>>> system mounted fine last week, and now you're getting this kind of >>>> error, it basically screams some kind of physical layer problem, or >>>> perhaps a bad hard drive, or perhaps the SAN disk is getting >>>> incorrectly written to by some other system, etc. >>>> >>>> - Ted >>> >>> Thanks for the reply. It is part of a RHEL cluster but we did not >>> have any situations where multiple systems mounted the filesystem. It >>> is a an old SAN so perhaps we have a physical issue. We'll see what it >>> happens with this pass. >> >> While I am waiting for fsck to finish, another thought. This >> filesystem contains a lot of small files. 35,867,642 files to be >> exact. Anything else I should check or know to ensure a smooth >> operation for these types of filesystems? I formatted them with >> standard RHEL 6 options. > > FSCK completed fixing a lot of things. The file system then mounted > without any errors. We are still getting these types of errors in > /var/log/messages: > > Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6): > ext4_dx_find_entry: bad entry in directory #743966900: directory entry > across blocks - block=2975876794offset=0(946176), inode=1414751737, > rec_len=45724, name_len=206 > > Thoughts? Hold that thought. This is another filesystem. Let me fix that one then come back to this problem if it still exists. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-10 13:56 ` Terry @ 2012-09-11 16:22 ` Terry 2012-09-11 17:00 ` Theodore Ts'o 2012-09-11 17:59 ` Lukáš Czerner 0 siblings, 2 replies; 12+ messages in thread From: Terry @ 2012-09-11 16:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Mon, Sep 10, 2012 at 8:56 AM, Terry <td3201@gmail.com> wrote: > On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote: >> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote: >>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote: >>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: >>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: >>>>>> >>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with >>>>>> these errors: >>>>>> >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: >>>>>> Inode bitmap for group 3200 not in group (block 4161027887)! >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! >>>>> >>>>> These indicate a very basic file system corruption where the block >>>>> group descriptors are corrupted. E2fsck will complain immediately >>>>> upon seeing this sort of fs inconsistency, and the first thing it will >>>>> try to do is fix it. >>>>> >>>>>> We did a proactive fsck on Tuesday of last week because it was >>>>>> starting to give filesystem errors. It ran through and mounted fine. >>>>>> >>>>>> The filesystem lives on an equallogic SAN spread across 36 drives. >>>>>> Could this be something with the physical layer or is it not abnormal >>>>>> to have to run multiple rounds of fsck to fully fix an issue? >>>>> >>>>> This is most probably a hardware problem; normally e2fsck will fix >>>>> file system corruptions (and certainly problems such as corrupt block >>>>> group scriptors) in a single pass. If e2fsck finished and the file >>>>> system mounted fine last week, and now you're getting this kind of >>>>> error, it basically screams some kind of physical layer problem, or >>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting >>>>> incorrectly written to by some other system, etc. >>>>> >>>>> - Ted >>>> >>>> Thanks for the reply. It is part of a RHEL cluster but we did not >>>> have any situations where multiple systems mounted the filesystem. It >>>> is a an old SAN so perhaps we have a physical issue. We'll see what it >>>> happens with this pass. >>> >>> While I am waiting for fsck to finish, another thought. This >>> filesystem contains a lot of small files. 35,867,642 files to be >>> exact. Anything else I should check or know to ensure a smooth >>> operation for these types of filesystems? I formatted them with >>> standard RHEL 6 options. >> >> FSCK completed fixing a lot of things. The file system then mounted >> without any errors. We are still getting these types of errors in >> /var/log/messages: >> >> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6): >> ext4_dx_find_entry: bad entry in directory #743966900: directory entry >> across blocks - block=2975876794offset=0(946176), inode=1414751737, >> rec_len=45724, name_len=206 >> >> Thoughts? > > Hold that thought. This is another filesystem. Let me fix that one > then come back to this problem if it still exists. Ok, fixed the other filesystem (dm-6) yesterday. Today, getting these errors still on it: Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048 in gd Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665 in gd Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap, 8302 in gd Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap, 23711 in gd Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap, 26624 in gd Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap, 1256 in gd Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap, 384 in gd Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap, 10453 in gd Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap, 5641 in gd Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6): ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap, 29946 in gd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-11 16:22 ` Terry @ 2012-09-11 17:00 ` Theodore Ts'o 2012-09-11 17:07 ` Terry 2012-09-11 17:59 ` Lukáš Czerner 1 sibling, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2012-09-11 17:00 UTC (permalink / raw) To: Terry; +Cc: linux-ext4 You haven't said what kernel you are using, but this really does look like storage device issue. You mentioned this was a RHEL cluster? Have you considered opening a support ticket with Red Hat? Regards, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-11 17:00 ` Theodore Ts'o @ 2012-09-11 17:07 ` Terry 2012-09-11 18:06 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Terry @ 2012-09-11 17:07 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Tue, Sep 11, 2012 at 12:00 PM, Theodore Ts'o <tytso@mit.edu> wrote: > You haven't said what kernel you are using, but this really does look > like storage device issue. You mentioned this was a RHEL cluster? > Have you considered opening a support ticket with Red Hat? > > Regards, > > - Ted I just did that. RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-11 17:07 ` Terry @ 2012-09-11 18:06 ` Theodore Ts'o 2012-09-11 18:16 ` Eric Sandeen 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2012-09-11 18:06 UTC (permalink / raw) To: Terry; +Cc: linux-ext4 On Tue, Sep 11, 2012 at 12:07:14PM -0500, Terry wrote: > > RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64 I'll let Eric or Lukas comment, but as far as I know the ext4 in the RHEL 6 kernels has been quite stable (there are a lot of bug fixes that have been backported to the RHEL 6 kernel, and while it doesn't have some of the newer ext4 features, it doesn't have any of the more exciting bugs that might come with the newer features :-). So I really would strongly suspect the SAN or the SAN-attached storage as being flaky, causing the file system corruptions which is leading to the kernel and e2fsck complaining. Regards, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-11 18:06 ` Theodore Ts'o @ 2012-09-11 18:16 ` Eric Sandeen 0 siblings, 0 replies; 12+ messages in thread From: Eric Sandeen @ 2012-09-11 18:16 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Terry, linux-ext4 On 9/11/12 1:06 PM, Theodore Ts'o wrote: > On Tue, Sep 11, 2012 at 12:07:14PM -0500, Terry wrote: >> >> RHEL 6.3 with kernel 2.6.32-279.5.2.el6.x86_64 > > I'll let Eric or Lukas comment, but as far as I know the ext4 in the > RHEL 6 kernels has been quite stable (there are a lot of bug fixes > that have been backported to the RHEL 6 kernel, and while it doesn't > have some of the newer ext4 features, it doesn't have any of the more > exciting bugs that might come with the newer features :-). > > So I really would strongly suspect the SAN or the SAN-attached storage > as being flaky, causing the file system corruptions which is leading > to the kernel and e2fsck complaining. Right, ext4 is default in RHEL6 and well tended. We'll let support look at the case, and see what's going on. But I'd certainly troubleshoot the san as well, esp. to be sure HA is set up properly and not mounting it twice on 2 nodes etc. -Eric > Regards, > > - Ted > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext4 won't mount - fsck required - 2nd fsck in less than a week 2012-09-11 16:22 ` Terry 2012-09-11 17:00 ` Theodore Ts'o @ 2012-09-11 17:59 ` Lukáš Czerner 1 sibling, 0 replies; 12+ messages in thread From: Lukáš Czerner @ 2012-09-11 17:59 UTC (permalink / raw) To: Terry; +Cc: Theodore Ts'o, linux-ext4 On Tue, 11 Sep 2012, Terry wrote: > Date: Tue, 11 Sep 2012 11:22:27 -0500 > From: Terry <td3201@gmail.com> > To: Theodore Ts'o <tytso@mit.edu> > Cc: linux-ext4@vger.kernel.org > Subject: Re: ext4 won't mount - fsck required - 2nd fsck in less than a week > > On Mon, Sep 10, 2012 at 8:56 AM, Terry <td3201@gmail.com> wrote: > > On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@gmail.com> wrote: > >> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@gmail.com> wrote: > >>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@gmail.com> wrote: > >>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@mit.edu> wrote: > >>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote: > >>>>>> > >>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with > >>>>>> these errors: > >>>>>> > >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors: > >>>>>> Inode bitmap for group 3200 not in group (block 4161027887)! > >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted! > >>>>> > >>>>> These indicate a very basic file system corruption where the block > >>>>> group descriptors are corrupted. E2fsck will complain immediately > >>>>> upon seeing this sort of fs inconsistency, and the first thing it will > >>>>> try to do is fix it. > >>>>> > >>>>>> We did a proactive fsck on Tuesday of last week because it was > >>>>>> starting to give filesystem errors. It ran through and mounted fine. > >>>>>> > >>>>>> The filesystem lives on an equallogic SAN spread across 36 drives. > >>>>>> Could this be something with the physical layer or is it not abnormal > >>>>>> to have to run multiple rounds of fsck to fully fix an issue? > >>>>> > >>>>> This is most probably a hardware problem; normally e2fsck will fix > >>>>> file system corruptions (and certainly problems such as corrupt block > >>>>> group scriptors) in a single pass. If e2fsck finished and the file > >>>>> system mounted fine last week, and now you're getting this kind of > >>>>> error, it basically screams some kind of physical layer problem, or > >>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting > >>>>> incorrectly written to by some other system, etc. > >>>>> > >>>>> - Ted > >>>> > >>>> Thanks for the reply. It is part of a RHEL cluster but we did not > >>>> have any situations where multiple systems mounted the filesystem. It > >>>> is a an old SAN so perhaps we have a physical issue. We'll see what it > >>>> happens with this pass. > >>> > >>> While I am waiting for fsck to finish, another thought. This > >>> filesystem contains a lot of small files. 35,867,642 files to be > >>> exact. Anything else I should check or know to ensure a smooth > >>> operation for these types of filesystems? I formatted them with > >>> standard RHEL 6 options. > >> > >> FSCK completed fixing a lot of things. The file system then mounted > >> without any errors. We are still getting these types of errors in > >> /var/log/messages: > >> > >> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6): > >> ext4_dx_find_entry: bad entry in directory #743966900: directory entry > >> across blocks - block=2975876794offset=0(946176), inode=1414751737, > >> rec_len=45724, name_len=206 > >> > >> Thoughts? > > > > Hold that thought. This is another filesystem. Let me fix that one > > then come back to this problem if it still exists. > > Ok, fixed the other filesystem (dm-6) yesterday. Today, getting these > errors still on it: > Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048 > in gd > Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665 > in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap, > 8302 in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap, > 23711 in gd > Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap, > 26624 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap, > 1256 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap, > 384 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap, > 10453 in gd > Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap, > 5641 in gd > Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6): > ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap, > 29946 in gd Hi, what RHEL version are you using, or even better what kernel version are you using ? If you have RHEL subscription, you should definitely Red Hat about the issue. Thanks! -Lukas > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-09-11 18:16 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-10 2:34 ext4 won't mount - fsck required - 2nd fsck in less than a week Terry 2012-09-10 2:47 ` Theodore Ts'o 2012-09-10 2:53 ` Terry 2012-09-10 3:18 ` Terry 2012-09-10 13:48 ` Terry 2012-09-10 13:56 ` Terry 2012-09-11 16:22 ` Terry 2012-09-11 17:00 ` Theodore Ts'o 2012-09-11 17:07 ` Terry 2012-09-11 18:06 ` Theodore Ts'o 2012-09-11 18:16 ` Eric Sandeen 2012-09-11 17:59 ` Lukáš Czerner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).