EXT4-fs: group descriptors corrupted!

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* EXT4-fs: group descriptors corrupted!
@ 2009-02-25 20:39 Ron Johnson
  2009-02-25 21:30 ` Theodore Tso
  0 siblings, 1 reply; 10+ messages in thread
From: Ron Johnson @ 2009-02-25 20:39 UTC (permalink / raw)
  To: Linux-Ext4

Hi,


I get an error that seems to be the same as in this thread:
http://kerneltrap.org/mailarchive/linux-ext4/2009/1/5/4598534

Below is the link to a dumpe2fs.

$ dmesg | grep EXT4
[   45.995261] EXT4-fs: barriers enabled
[   46.014833] EXT4 FS on hda1, internal journal on hda1:8
[   46.014873] EXT4-fs: delayed allocation enabled
[   46.014912] EXT4-fs: file extents enabled
[   46.015883] EXT4-fs: mballoc enabled
[   46.015926] EXT4-fs: mounted filesystem with ordered data mode.
[ 1392.640482] EXT4-fs: ext4_check_descriptors: Block bitmap for 
group 0 not in group (block 3120627712)!
[ 1392.640490] EXT4-fs: group descriptors corrupted!

These are the relevant Debian Sid package versions:
e2fsprogs                1.41.3-1
linux-source-2.6.28      2.6.28-2~snapshot.12850

Since I built the fs with extents enabled, I am hesitant to run fsck 
on it because I've read that it doesn't yet support extents.

As you can see from the dmesg snippets, one of my ext4 file systems 
mounted perfectly.  The one that didn't is an lvm2 "array", which 
seems to be consistent.

# lvdisplay
     Logging initialised at Wed Feb 25 14:31:23 2009
     Set umask to 0077
lvdisplay    Finding all logical volumes
lvdisplay  --- Logical volume ---
lvdisplay  LV Name                /dev/main_huge_vg/main_huge_lv
lvdisplay  VG Name                main_huge_vg
lvdisplay  LV UUID                Pgrlks-mtmc-GuYh-kvPU-Mr78-w9b6-uykW8A
lvdisplay  LV Write Access        read/write
lvdisplay  LV Status              available
lvdisplay  # open                 0
lvdisplay  LV Size                2.69 TB
lvdisplay  Current LE             22023
lvdisplay  Segments               9
lvdisplay  Allocation             inherit
lvdisplay  Read ahead sectors     auto
lvdisplay  - currently set to     256
lvdisplay  Block device           254:0
lvdisplay
lvdisplay    Wiping internal VG cache

# dumpe2fs  -f/dev/main_huge_vg/main_huge_lv | head -n800 > 
main_huge_lv.dump.txt
dumpe2fs 1.41.3 (12-Oct-2008)
ext2fs_read_bb_inode: Invalid argument
http://members.cox.net/ron.l.johnson/main_huge_lv.dump.txt

TIA

-- 
Ron Johnson, Jr.
Jefferson LA  USA

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 20:39 Ron Johnson
@ 2009-02-25 21:30 ` Theodore Tso
  2009-02-25 21:47   ` Ron Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: Theodore Tso @ 2009-02-25 21:30 UTC (permalink / raw)
  To: Ron Johnson; +Cc: Linux-Ext4

On Wed, Feb 25, 2009 at 02:39:31PM -0600, Ron Johnson wrote:
> These are the relevant Debian Sid package versions:
> e2fsprogs                1.41.3-1
> linux-source-2.6.28      2.6.28-2~snapshot.12850
>
> Since I built the fs with extents enabled, I am hesitant to run fsck on 
> it because I've read that it doesn't yet support extents.
>
> As you can see from the dmesg snippets, one of my ext4 file systems  
> mounted perfectly.  The one that didn't is an lvm2 "array", which seems 
> to be consistent.

E2fsck 1.41.3 does actually have extent support, but let's not be too
hasty to run e2fsck just yet.

Can you send me the output of "dumpe2fs -o superblock=32768
/dev/main_huge_vg/main_huge_lv" and see how it compares to the
dumpe2fs of your primary superblock?

It looks like your block group descriptors were totally scribbled
upon.  How and why it happened, I'm not sure.  But before we do
anything else, let's check out the backup descriptors and make sure
they are sane.

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 21:30 ` Theodore Tso
@ 2009-02-25 21:47   ` Ron Johnson
  2009-02-25 23:18     ` Theodore Tso
  0 siblings, 1 reply; 10+ messages in thread
From: Ron Johnson @ 2009-02-25 21:47 UTC (permalink / raw)
  To: Linux-Ext4

On 02/25/2009 03:30 PM, Theodore Tso wrote:
> On Wed, Feb 25, 2009 at 02:39:31PM -0600, Ron Johnson wrote:
>> These are the relevant Debian Sid package versions:
>> e2fsprogs                1.41.3-1
>> linux-source-2.6.28      2.6.28-2~snapshot.12850
>>
>> Since I built the fs with extents enabled, I am hesitant to run fsck on 
>> it because I've read that it doesn't yet support extents.
>>
>> As you can see from the dmesg snippets, one of my ext4 file systems  
>> mounted perfectly.  The one that didn't is an lvm2 "array", which seems 
>> to be consistent.
> 
> E2fsck 1.41.3 does actually have extent support, but let's not be too
> hasty to run e2fsck just yet.
> 
> Can you send me the output of "dumpe2fs -o superblock=32768
> /dev/main_huge_vg/main_huge_lv" and see how it compares to the
> dumpe2fs of your primary superblock?

# dumpe2fs -f -o superblock=32768 /dev/main_huge_vg/main_huge_lv | 
head -n800 > main_huge_lv.dump.sb32768.txt
dumpe2fs 1.41.3 (12-Oct-2008)
ext2fs_read_bb_inode: Invalid argument

http://members.cox.net/ron.l.johnson/main_huge_lv.dump.sb32768.txt

> It looks like your block group descriptors were totally scribbled
> upon.

There system cleanly shut down.  (Or, at least, I didn't notice 
anything obviously wrong.)

>        How and why it happened, I'm not sure.  But before we do
> anything else, let's check out the backup descriptors and make sure
> they are sane.

-- 
Ron Johnson, Jr.
Jefferson LA  USA

The feeling of disgust at seeing a human female in a Relationship
with a chimp male is Homininphobia, and you should be ashamed of
yourself.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 21:47   ` Ron Johnson
@ 2009-02-25 23:18     ` Theodore Tso
  2009-02-25 23:41       ` Greg Freemyer
  2009-02-25 23:42       ` Ron Johnson
  0 siblings, 2 replies; 10+ messages in thread
From: Theodore Tso @ 2009-02-25 23:18 UTC (permalink / raw)
  To: Ron Johnson; +Cc: Linux-Ext4

Huh.  OK, there's something really strange going on here.

The kernel never updates the backup superblock; that's by design, to
avoid corruption problems.  So for example, on my laptop, if I run
dumpe2fs on my root partition, I see this:

Filesystem created:       Fri Feb 13 09:00:02 2009
Last mount time:          Tue Feb 24 14:34:19 2009
Last write time:          Tue Feb 24 14:34:19 2009
Mount count:              3
Maximum mount count:      30
Last checked:             Sat Feb 14 10:46:41 2009
Check interval:           15552000 (6 months)
Next check after:         Thu Aug 13 11:46:41 2009

However, if I run dumpe2fs -o superblock=32768 on my root partition,
I'll see this:

Filesystem created:       Fri Feb 13 09:00:02 2009
Last mount time:          Fri Feb 13 11:22:06 2009
Last write time:          Sat Feb 14 10:47:11 2009
Mount count:              0
Maximum mount count:      30
Last checked:             Sat Feb 14 10:46:41 2009
Check interval:           15552000 (6 months)
Next check after:         Thu Aug 13 11:46:41 2009

Note the difference in the "last write time" and the "last mount
time".  That's because normally we avoid touching the backup
superblocks.

Now let's take a look at your dumpe2fs output.  In your case, we see
the following:

Filesystem created:       Thu Jan 22 19:33:20 2009
Last mount time:          Fri Jan 23 16:23:58 2009
Last write time:          Sun Feb 22 02:31:02 2009
Mount count:              1
Maximum mount count:      24
Last checked:             Fri Jan 23 16:19:49 2009
Check interval:           15552000 (6 months)
Next check after:         Wed Jul 22 17:19:49 2009

and it's the same on both the primary and backup (dumpe2fs -o
superblock=32768).  The question is how the heck did *that* happen?
As I mentioned, the kernel doesn't even have code to touch the backup
superblock.  That would tend to implicate one of the e2fsprogs tools,
or sometihng using the e2fsprogs libraries --- but the recent
libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
backup superblocks.  The only tools that could have done it from
e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
doesn't explain how the values turned out to be pure garbage.

Does that the "last write" timestamp suggest anything to you?  What
was happening on the system at or around Sun Feb 22 02:31:02 2009?
Maybe if we can localize this down to what userspace program caused
the problem, it'll be a hint.

(This is why I didn't want you to run e2fsck just yet; if you had, it
would have overwritten the last write time, which could be a value
clue as to what is causing this problem.)

As far as how to recover your data, what I would recommend doing is
creating a writeable LVM snapshot, with a pretty good amount of space.
Then try running the command "mke2fs -S " on the snapshot, with
*precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
used to create the filesystem in the first place.  Then cross your
fingers, and e2fsck on the snapshot, and see how much of the data you
can recover; some of it may end up in lost+found, but hopefully you'll
get most of the data back.  If it works on snapshot, only then try it
on the real LVM.  If it doesn't work out on the snapshot, you can
always discard it and try again without further corrupting any of your
original filesystem.

Good luck, and thanks in advance for anything information you can give
us to help track down this problem.  And this point I'm going to guess
that it's a nasty e2fsprogs bug, where somehow the internal in-memory
version of the block group descriptors got corrupted, and then gotten
writen out to disk.  But this is just a guess at this point --- and
I'm still left wondering why I haven't seen it on my systems and on my
regression testing.

    	       		       	       	    - Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 23:18     ` Theodore Tso
@ 2009-02-25 23:41       ` Greg Freemyer
  2009-02-25 23:42       ` Ron Johnson
  1 sibling, 0 replies; 10+ messages in thread
From: Greg Freemyer @ 2009-02-25 23:41 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Ron Johnson, Linux-Ext4, Ric Wheeler

Smart ass comment about the new ATA spec intentionally top-posted.

Question:  How do you know those sectors did not somehow get
discarded, then modified behind the scenes by a SSD, then fixated to
new deterministic values by a read.

Answer: Because devices that do that aren't shipping yet.

Damn the future looks good from here.

On Wed, Feb 25, 2009 at 6:18 PM, Theodore Tso <tytso@mit.edu> wrote:
> Huh.  OK, there's something really strange going on here.
>
> The kernel never updates the backup superblock; that's by design, to
> avoid corruption problems.  So for example, on my laptop, if I run
> dumpe2fs on my root partition, I see this:
>
> Filesystem created:       Fri Feb 13 09:00:02 2009
> Last mount time:          Tue Feb 24 14:34:19 2009
> Last write time:          Tue Feb 24 14:34:19 2009
> Mount count:              3
> Maximum mount count:      30
> Last checked:             Sat Feb 14 10:46:41 2009
> Check interval:           15552000 (6 months)
> Next check after:         Thu Aug 13 11:46:41 2009
>
> However, if I run dumpe2fs -o superblock=32768 on my root partition,
> I'll see this:
>
> Filesystem created:       Fri Feb 13 09:00:02 2009
> Last mount time:          Fri Feb 13 11:22:06 2009
> Last write time:          Sat Feb 14 10:47:11 2009
> Mount count:              0
> Maximum mount count:      30
> Last checked:             Sat Feb 14 10:46:41 2009
> Check interval:           15552000 (6 months)
> Next check after:         Thu Aug 13 11:46:41 2009
>
> Note the difference in the "last write time" and the "last mount
> time".  That's because normally we avoid touching the backup
> superblocks.
>
> Now let's take a look at your dumpe2fs output.  In your case, we see
> the following:
>
> Filesystem created:       Thu Jan 22 19:33:20 2009
> Last mount time:          Fri Jan 23 16:23:58 2009
> Last write time:          Sun Feb 22 02:31:02 2009
> Mount count:              1
> Maximum mount count:      24
> Last checked:             Fri Jan 23 16:19:49 2009
> Check interval:           15552000 (6 months)
> Next check after:         Wed Jul 22 17:19:49 2009
>
> and it's the same on both the primary and backup (dumpe2fs -o
> superblock=32768).  The question is how the heck did *that* happen?
> As I mentioned, the kernel doesn't even have code to touch the backup
> superblock.  That would tend to implicate one of the e2fsprogs tools,
> or sometihng using the e2fsprogs libraries --- but the recent
> libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
> backup superblocks.  The only tools that could have done it from
> e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
> doesn't explain how the values turned out to be pure garbage.
>
> Does that the "last write" timestamp suggest anything to you?  What
> was happening on the system at or around Sun Feb 22 02:31:02 2009?
> Maybe if we can localize this down to what userspace program caused
> the problem, it'll be a hint.
>
> (This is why I didn't want you to run e2fsck just yet; if you had, it
> would have overwritten the last write time, which could be a value
> clue as to what is causing this problem.)
>
> As far as how to recover your data, what I would recommend doing is
> creating a writeable LVM snapshot, with a pretty good amount of space.
> Then try running the command "mke2fs -S " on the snapshot, with
> *precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
> used to create the filesystem in the first place.  Then cross your
> fingers, and e2fsck on the snapshot, and see how much of the data you
> can recover; some of it may end up in lost+found, but hopefully you'll
> get most of the data back.  If it works on snapshot, only then try it
> on the real LVM.  If it doesn't work out on the snapshot, you can
> always discard it and try again without further corrupting any of your
> original filesystem.
>
> Good luck, and thanks in advance for anything information you can give
> us to help track down this problem.  And this point I'm going to guess
> that it's a nasty e2fsprogs bug, where somehow the internal in-memory
> version of the block group descriptors got corrupted, and then gotten
> writen out to disk.  But this is just a guess at this point --- and
> I'm still left wondering why I haven't seen it on my systems and on my
> regression testing.
>
>                                            - Ted
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 23:18     ` Theodore Tso
  2009-02-25 23:41       ` Greg Freemyer
@ 2009-02-25 23:42       ` Ron Johnson
  2009-02-26  0:05         ` Andreas Dilger
  1 sibling, 1 reply; 10+ messages in thread
From: Ron Johnson @ 2009-02-25 23:42 UTC (permalink / raw)
  To: Linux-Ext4

On 02/25/2009 05:18 PM, Theodore Tso wrote:
> Huh.  OK, there's something really strange going on here.
> 
> The kernel never updates the backup superblock; that's by design, to
> avoid corruption problems.  So for example, on my laptop, if I run
> dumpe2fs on my root partition, I see this:
> 
> Filesystem created:       Fri Feb 13 09:00:02 2009
> Last mount time:          Tue Feb 24 14:34:19 2009
> Last write time:          Tue Feb 24 14:34:19 2009
> Mount count:              3
> Maximum mount count:      30
> Last checked:             Sat Feb 14 10:46:41 2009
> Check interval:           15552000 (6 months)
> Next check after:         Thu Aug 13 11:46:41 2009
> 
> However, if I run dumpe2fs -o superblock=32768 on my root partition,
> I'll see this:
> 
> Filesystem created:       Fri Feb 13 09:00:02 2009
> Last mount time:          Fri Feb 13 11:22:06 2009
> Last write time:          Sat Feb 14 10:47:11 2009
> Mount count:              0
> Maximum mount count:      30
> Last checked:             Sat Feb 14 10:46:41 2009
> Check interval:           15552000 (6 months)
> Next check after:         Thu Aug 13 11:46:41 2009
> 
> Note the difference in the "last write time" and the "last mount
> time".  That's because normally we avoid touching the backup
> superblocks.
> 
> Now let's take a look at your dumpe2fs output.  In your case, we see
> the following:
> 
> Filesystem created:       Thu Jan 22 19:33:20 2009
> Last mount time:          Fri Jan 23 16:23:58 2009
> Last write time:          Sun Feb 22 02:31:02 2009
> Mount count:              1
> Maximum mount count:      24
> Last checked:             Fri Jan 23 16:19:49 2009
> Check interval:           15552000 (6 months)
> Next check after:         Wed Jul 22 17:19:49 2009
> 
> and it's the same on both the primary and backup (dumpe2fs -o
> superblock=32768).  The question is how the heck did *that* happen?
> As I mentioned, the kernel doesn't even have code to touch the backup
> superblock.  That would tend to implicate one of the e2fsprogs tools,
> or sometihng using the e2fsprogs libraries --- but the recent
> libraries (and you're using e2fsprogs 1.41.x) also avoid touching the
> backup superblocks.  The only tools that could have done it from
> e2fsprogs userland are e2fsck, tune2fs, and resize2fs, and that
> doesn't explain how the values turned out to be pure garbage.
> 
> Does that the "last write" timestamp suggest anything to you?  What
> was happening on the system at or around Sun Feb 22 02:31:02 2009?
> Maybe if we can localize this down to what userspace program caused
> the problem, it'll be a hint.

That's about 10 hours before I rebooted the machine, middle of a 
Saturday night...

I performed a rather large apt-get upgrade at around 01:30, but that 
  would have only touched /, not my "big data" directory. 
~/Documents  is symlinked into /data/big/Documents, so I might have 
been editing an OOo document, or copying a YouTube file to it, but 
nothing pops into mind.

> (This is why I didn't want you to run e2fsck just yet; if you had, it
> would have overwritten the last write time, which could be a value
> clue as to what is causing this problem.)
> 
> As far as how to recover your data, what I would recommend doing is
> creating a writeable LVM snapshot, with a pretty good amount of space.

Sorry, but I don't have *any* unallocated space left.

> Then try running the command "mke2fs -S " on the snapshot, with
> *precisely* the same mke2fs arguments and /etc/mke2fs.conf that you
> used to create the filesystem in the first place.  Then cross your
> fingers, and e2fsck on the snapshot, and see how much of the data you
> can recover; some of it may end up in lost+found, but hopefully you'll
> get most of the data back.  If it works on snapshot, only then try it
> on the real LVM.  If it doesn't work out on the snapshot, you can
> always discard it and try again without further corrupting any of your
> original filesystem.
> 
> Good luck, and thanks in advance for anything information you can give
> us to help track down this problem.  And this point I'm going to guess
> that it's a nasty e2fsprogs bug, where somehow the internal in-memory

I'm sure that I didn't run any "e2" app on a mounted device!

> version of the block group descriptors got corrupted, and then gotten
> writen out to disk.  But this is just a guess at this point --- and
> I'm still left wondering why I haven't seen it on my systems and on my
> regression testing.

Note that this only happened on a reboot.  I had mounted & unmounted 
this device many times while learning about lvm2, adding files, 
resizing-expanding the fs, adding more files, etc.  But that only 
took two days, and then it "sat" there for almost 4 weeks with no 
problems.

-- 
Ron Johnson, Jr.
Jefferson LA  USA

The feeling of disgust at seeing a human female in a Relationship
with a chimp male is Homininphobia, and you should be ashamed of
yourself.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-25 23:42       ` Ron Johnson
@ 2009-02-26  0:05         ` Andreas Dilger
  2009-02-26  0:23           ` Ron Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2009-02-26  0:05 UTC (permalink / raw)
  To: Ron Johnson; +Cc: Linux-Ext4

On Feb 25, 2009  17:42 -0600, Ron Johnson wrote:
> On 02/25/2009 05:18 PM, Theodore Tso wrote:
>> Now let's take a look at your dumpe2fs output.  In your case, we see
>> the following:
>>
>> Filesystem created:       Thu Jan 22 19:33:20 2009
>> Last mount time:          Fri Jan 23 16:23:58 2009
>> Last write time:          Sun Feb 22 02:31:02 2009
>> Mount count:              1
>> Maximum mount count:      24
>> Last checked:             Fri Jan 23 16:19:49 2009
>> Check interval:           15552000 (6 months)
>> Next check after:         Wed Jul 22 17:19:49 2009
>>
>> and it's the same on both the primary and backup (dumpe2fs -o
>> superblock=32768).  The question is how the heck did *that* happen?
>> As I mentioned, the kernel doesn't even have code to touch the backup
>> superblock.

Except online resizing?  It HAS to update the backup superblocks,
otherwise if the primary gets corrupted the backup will not have
the right total blocks count and anything beyond the old blocks
count might be lost...

>> Does that the "last write" timestamp suggest anything to you?  What
>> was happening on the system at or around Sun Feb 22 02:31:02 2009?
>> Maybe if we can localize this down to what userspace program caused
>> the problem, it'll be a hint.
>
> That's about 10 hours before I rebooted the machine, middle of a  
> Saturday night...

Please take time zones into account also.

> I performed a rather large apt-get upgrade at around 01:30, but that  
> would have only touched /, not my "big data" directory. ~/Documents  is 
> symlinked into /data/big/Documents, so I might have been editing an OOo 
> document, or copying a YouTube file to it, but nothing pops into mind.

This might have happened AFTER your reboot, by e2fsck or similar?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-02-26  0:05         ` Andreas Dilger
@ 2009-02-26  0:23           ` Ron Johnson
  0 siblings, 0 replies; 10+ messages in thread
From: Ron Johnson @ 2009-02-26  0:23 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Linux-Ext4

On 02/25/2009 06:05 PM, Andreas Dilger wrote:
> On Feb 25, 2009  17:42 -0600, Ron Johnson wrote:
>> On 02/25/2009 05:18 PM, Theodore Tso wrote:
>>> Now let's take a look at your dumpe2fs output.  In your case, we see
>>> the following:
>>>
>>> Filesystem created:       Thu Jan 22 19:33:20 2009
>>> Last mount time:          Fri Jan 23 16:23:58 2009
>>> Last write time:          Sun Feb 22 02:31:02 2009
>>> Mount count:              1
>>> Maximum mount count:      24
>>> Last checked:             Fri Jan 23 16:19:49 2009
>>> Check interval:           15552000 (6 months)
>>> Next check after:         Wed Jul 22 17:19:49 2009
>>>
>>> and it's the same on both the primary and backup (dumpe2fs -o
>>> superblock=32768).  The question is how the heck did *that* happen?
>>> As I mentioned, the kernel doesn't even have code to touch the backup
>>> superblock.
> 
> Except online resizing?  It HAS to update the backup superblocks,
> otherwise if the primary gets corrupted the backup will not have
> the right total blocks count and anything beyond the old blocks
> count might be lost...
> 
>>> Does that the "last write" timestamp suggest anything to you?  What
>>> was happening on the system at or around Sun Feb 22 02:31:02 2009?
>>> Maybe if we can localize this down to what userspace program caused
>>> the problem, it'll be a hint.
>> That's about 10 hours before I rebooted the machine, middle of a  
>> Saturday night...
> 
> Please take time zones into account also.
> 
>> I performed a rather large apt-get upgrade at around 01:30, but that  
>> would have only touched /, not my "big data" directory. ~/Documents  is 
>> symlinked into /data/big/Documents, so I might have been editing an OOo 
>> document, or copying a YouTube file to it, but nothing pops into mind.
> 
> This might have happened AFTER your reboot, by e2fsck or similar?

Since I'm at -0600, that would have been Sat Feb 21 20:31:02 2009 
CST, and I'd have been watching TV, or some such.  *Maybe* dumping a 
movie to disk from DVD with mplayer.

-- 
Ron Johnson, Jr.
Jefferson LA  USA

The feeling of disgust at seeing a human female in a Relationship
with a chimp male is Homininphobia, and you should be ashamed of
yourself.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* EXT4-fs: group descriptors corrupted!
@ 2009-03-07  6:55 Christian
  2009-03-09 16:13 ` Christian
  0 siblings, 1 reply; 10+ messages in thread
From: Christian @ 2009-03-07  6:55 UTC (permalink / raw)
  To: linux-ext4

Hello,

I have an EXT4 filesystem on a raid6 that I created with 2.6.28.  After
the initial creation of the partition I later did 3 filesystem expands
after adding drives.  I didn't have a problem until I recently had a
kernel panic and rebooted.  I think the kernel panic was related to my
nvidia driver.  After rebooting the raid no longer mounted, and dmesg
reported:

EXT4-fs: ext4_check_descriptors: Inode bitmap for group 0 not in group
(block 3245938880)!
EXT4-fs: group descriptors corrupted!

After reading several threads online, I attempted a fsck pointing to a
backup superblock and now I receive this error:

EXT4-fs: ext4_check_descriptors: Checksum for group 0 failed
(7390!=34008)
EXT4-fs: group descriptors corrupted!

using dd if=/dev/md1|strings I can see a number of the files on the disk. 

Running Gentoo: 

Linux server1 2.6.28.5 #5 SMP Mon Feb 23 00:52:10 EST 2009 x86_64
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz GenuineIntel GNU/Linux

e2fsprogs 1.41.3

6GB memory
6GB swap


# dumpe2fs /dev/md1
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          1b9e0aec-79b4-48e1-b801-54a2792ef9b3
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              549429248
Block count:              2197703904
Reserved block count:     0
Free blocks:              1027127451
Free inodes:              545908446
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      500
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Wed Feb 18 16:40:04 2009
Last mount time:          Wed Feb 18 17:07:50 2009
Last write time:          Fri Mar  6 22:11:15 2009
Mount count:              1
Maximum mount count:      35
Last checked:             Wed Feb 18 16:40:04 2009
Check interval:           15552000 (6 months)
Next check after:         Mon Aug 17 17:40:04 2009
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      27ba6512-da53-49ba-abdf-f79299a6eba2
Journal backup:           inode blocks
Journal size:             128M


 # dumpe2fs -o superblock=32768 /dev/md1
 dumpe2fs 1.41.3 (12-Oct-2008)
 Filesystem volume name:   <none>
 Last mounted on:          <not available>
 Filesystem UUID:          1b9e0aec-79b4-48e1-b801-54a2792ef9b3
 Filesystem magic number:  0xEF53
 Filesystem revision #:    1 (dynamic)
 Filesystem features:      has_journal ext_attr resize_inode dir_index
 filetype needs_recovery extent flex_bg sparse_super large_file
 huge_file uninit_bg dir_nlink extra_isize
 Filesystem flags:         signed_directory_hash 
 Default mount options:    (none)
 Filesystem state:         clean
 Errors behavior:          Continue
 Filesystem OS type:       Linux
 Inode count:              549429248
 Block count:              2197703904
 Reserved block count:     0
 Free blocks:              1027127451
 Free inodes:              545908446
 First block:              0
 Block size:               4096
 Fragment size:            4096
 Reserved GDT blocks:      500
 Blocks per group:         32768
 Fragments per group:      32768
 Inodes per group:         8192
 Inode blocks per group:   512
 Flex block group size:    16
 Filesystem created:       Wed Feb 18 16:40:04 2009
 Last mount time:          Wed Feb 18 17:07:50 2009
 Last write time:          Wed Feb 18 17:07:50 2009
 Mount count:              1
 Maximum mount count:      35
 Last checked:             Wed Feb 18 16:40:04 2009
 Check interval:           15552000 (6 months)
 Next check after:         Mon Aug 17 17:40:04 2009
 Reserved blocks uid:      0 (user root)
 Reserved blocks gid:      0 (group root)
 First inode:              11
 Inode size:               256
 Required extra isize:     28
 Desired extra isize:      28
 Journal inode:            8
 Default directory hash:   half_md4
 Directory Hash Seed:      27ba6512-da53-49ba-abdf-f79299a6eba2
 Journal backup:           inode blocks
 Journal size:             128M


I'm stuck right now and hopefully I can recover the filesystem.

Any help would be appreciated.

Thanks


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: EXT4-fs: group descriptors corrupted!
  2009-03-07  6:55 EXT4-fs: group descriptors corrupted! Christian
@ 2009-03-09 16:13 ` Christian
  0 siblings, 0 replies; 10+ messages in thread
From: Christian @ 2009-03-09 16:13 UTC (permalink / raw)
  To: linux-ext4

I fixed this problem using fsck, but I will detail what occured incase
someone else comes across the same problem.   

Running fsck -n on the disk gave me the following:

fsck 1.41.4 (27-Jan-2009)
Group descriptor 0 checksum is invalid.  Fix? no
Group descriptor 1 checksum is invalid.  Fix? no
Group descriptor 2 checksum is invalid.  Fix? no
[33,000+ of these..]
Group descriptor 33534 checksum is invalid.  Fix? no
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

Free blocks count wrong for group #0 (23513, counted=1880).
Fix? no
Free blocks count wrong for group #1 (31743, counted=684).
Fix? no
[58,000+ of these..]
Free blocks count wrong for group #58287 (32254, counted=5095).
Fix? no

Free inodes count wrong for group #0 (8181, counted=0).
Fix? no
Directories count wrong for group #0 (2, counted=110).
Fix? no
[751 of these..]
Free inodes count wrong for group #913 (8192, counted=3376).
Fix? no
Directories count wrong for group #913 (0, counted=160).
Fix? no
Free inodes count wrong for group #928 (8192, counted=4034).
Fix? no
Directories count wrong for group #928 (0, counted=548).
Fix? no
Free inodes count wrong (545908446, counted=546390043).
Fix? no

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 3520802/549429248 files (5.2% non-contiguous),
1170576453/2197703904 blocks

Next was running fsck -y with fingers crossed..

fsck 1.41.4 (27-Jan-2009)
Group descriptor 0 checksum is invalid.  Fix? yes

[same thing as above but yes..]

Directories count wrong for group #928 (0, counted=548).
Fix? yes
Free inodes count wrong (549429237, counted=546390043).
Fix? yes
/dev/md1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md1: 3039205/549429248 files (6.0% non-contiguous),
1252924969/2197703904 blocks

After this the file system was back to normal, with no data loss.  I'm
thinking that something screwy occured during the 3 times that I ran
resize2fs (live) to expand the ext4 filesystem.  I initially started
with 5 1.5TB drives, and then added three more 1.5TB drives one-by-one
running resize2fs each time.  During all of this I never unmounted the
device or rebooted the system.

-Christian

> Hello,
> 
> I have an EXT4 filesystem on a raid6 that I created with 2.6.28.  After
> the initial creation of the partition I later did 3 filesystem expands
> after adding drives.  I didn't have a problem until I recently had a
> kernel panic and rebooted.  I think the kernel panic was related to my
> nvidia driver.  After rebooting the raid no longer mounted, and dmesg
> reported:
> 
> EXT4-fs: ext4_check_descriptors: Inode bitmap for group 0 not in group
> (block 3245938880)!
> EXT4-fs: group descriptors corrupted!
> 
> After reading several threads online, I attempted a fsck pointing to a
> backup superblock and now I receive this error:
> 
> EXT4-fs: ext4_check_descriptors: Checksum for group 0 failed
> (7390!=34008)
> EXT4-fs: group descriptors corrupted!
> 
> using dd if=/dev/md1|strings I can see a number of the files on the disk. 
> 
> Running Gentoo: 
> 
> Linux server1 2.6.28.5 #5 SMP Mon Feb 23 00:52:10 EST 2009 x86_64
> Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz GenuineIntel GNU/Linux
> 
> e2fsprogs 1.41.3
> 
> 6GB memory
> 6GB swap

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-09 17:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-07  6:55 EXT4-fs: group descriptors corrupted! Christian
2009-03-09 16:13 ` Christian
  -- strict thread matches above, loose matches on Subject: below --
2009-02-25 20:39 Ron Johnson
2009-02-25 21:30 ` Theodore Tso
2009-02-25 21:47   ` Ron Johnson
2009-02-25 23:18     ` Theodore Tso
2009-02-25 23:41       ` Greg Freemyer
2009-02-25 23:42       ` Ron Johnson
2009-02-26  0:05         ` Andreas Dilger
2009-02-26  0:23           ` Ron Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).