* Once more: Recovering a damaged ext4 fs? @ 2009-03-27 20:41 J.D. Bakker 2009-03-27 22:46 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: J.D. Bakker @ 2009-03-27 20:41 UTC (permalink / raw) To: linux-ext4 Hi all, My 4TB ext4 RAID-6 has been damaged again. Symptoms leading up to it were very similar to the last time (see http://article.gmane.org/gmane.comp.file-systems.ext4/11418 ): a process attempted to delete a large (~2GB) file, resulting in a soft lockup with the following call trace: [<ffffffff80526dd7>] ? _spin_lock+0x16/0x19 [<ffffffff80317b49>] ? ext4_mb_init_cache+0x81c/0xa58 [<ffffffff80281249>] ? __lru_cache_add+0x8e/0xb6 [<ffffffff80279d37>] ? find_or_create_page+0x62/0x88 [<ffffffff80317ec2>] ? ext4_mb_load_buddy+0x13d/0x326 [<ffffffff80318385>] ? ext4_mb_free_blocks+0x2da/0x75e [<ffffffff802c02d7>] ? __find_get_block+0xc6/0x1bc [<ffffffff802feebb>] ? ext4_free_blocks+0x7f/0xb2 [<ffffffff8031294b>] ? ext4_ext_truncate+0x3e3/0x854 [<ffffffff80306e38>] ? ext4_truncate+0x67/0x5bd [<ffffffff8032594e>] ? jbd2_journal_dirty_metadata+0x124/0x146 [<ffffffff80314d44>] ? __ext4_handle_dirty_metadata+0xac/0xb7 [<ffffffff803024c1>] ? ext4_mark_iloc_dirty+0x432/0x4a9 [<ffffffff80303177>] ? ext4_mark_inode_dirty+0x135/0x166 [<ffffffff803074e0>] ? ext4_delete_inode+0x152/0x22e [<ffffffff8030738e>] ? ext4_delete_inode+0x0/0x22e [<ffffffff802b44ac>] ? generic_delete_inode+0x82/0x109 [<ffffffff802acd44>] ? do_unlinkat+0xf7/0x150 [<ffffffff802a380c>] ? vfs_read+0x11e/0x133 [<ffffffff80527545>] ? page_fault+0x25/0x30 [<ffffffff8020c0ea>] ? system_call_fastpath+0x16/0x1 Kernel is 2.6.29-rc6. Machine is still responsive to anything that doesn't touch the ext4 file system, but fails to halt. Upon power cycling fsck fails with: newraidfs: Superblock has an invalid ext3 journal (inode 8). CLEARED. *** ext3 journal has been deleted - filesystem is now ext2 only *** newraidfs: Note: if several inode or block bitmap blocks or part of the inode table require relocation, you may wish to try running e2fsck with the '-b 32768' option first. The problem may lie only with the primary block group descriptors, and the backup block group descriptors may be OK. newraidfs: Block bitmap for group 0 is not in group. (block 3273617603) newraidfs: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) A manual e2fsck -nv /dev/md0 reported: e2fsck 1.41.4 (27-Jan-2009) ./e2fsck/e2fsck: Group descriptors look bad... trying backup blocks... Block bitmap for group 0 is not in group. (block 3273617603) Relocate? no Inode bitmap for group 0 is not in group. (block 3067860682) Relocate? no Inode table for group 0 is not in group. (block 3051956899) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Group descriptor 0 checksum is invalid. Fix? no Inode table for group 1 is not in group. (block 1842273247) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Group descriptor 1 checksum is invalid. Fix? no Inode bitmap for group 2 is not in group. (block 3148026909) Relocate? no Inode table for group 2 is not in group. (block 1321535690) WARNING: SEVERE DATA LOSS POSSIBLE. Relocate? no Group descriptor 2 checksum is invalid. Fix? no [...] ./e2fsck/e2fsck: Invalid argument while reading bad blocks inode This doesn't bode well, but we'll try to go on... Pass 1: Checking inodes, blocks, and sizes Illegal block number passed to ext2fs_test_block_bitmap #3051956899 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #3051956899 for in-use block map Illegal block number passed to ext2fs_test_block_bitmap #3051956900 for in-use block map Illegal block number passed to ext2fs_mark_block_bitmap #3051956900 for in-use block map [...] Full logs available at: http://lartmaker.nl/ext4/e2fsck-md0-20090327.txt http://lartmaker.nl/ext4/e2fsck-md0-32768-20090327.txt http://lartmaker.nl/ext4/e2fsck-md0-98304-20090327.txt I've run dumpe2fs: http://lartmaker.nl/ext4/dumpe2fs-md0-20090327.txt http://lartmaker.nl/ext4/dumpe2fs-md0-32768-20090327.txt http://lartmaker.nl/ext4/dumpe2fs-md0-98304-20090327.txt ...but it worries me that all three start with "ext2fs_read_bb_inode: Invalid argument". This is linux-2.6.29-rc6 (x86_64) running on an Intel Core i7 920 processor (quad core plus hyperthreading). Kernel config is http://lartmaker.nl/ext4/kernel-config-20090327.txt ; dmesg is at http://lartmaker.nl/ext4/dmesg-20090327.txt So, - is there a way to recover my file system? I do have backups of most data,but as my remote weeklies run on Saturdays I'd still lose a lot of work - is ext4 on software raid-6 on x86_64 considered production stable? I have been getting these hangs almost monthly, which is a lot worse than my old ext3 software RAID. Thanks, JDB. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-27 20:41 Once more: Recovering a damaged ext4 fs? J.D. Bakker @ 2009-03-27 22:46 ` Theodore Tso 2009-03-27 23:47 ` J.D. Bakker 0 siblings, 1 reply; 9+ messages in thread From: Theodore Tso @ 2009-03-27 22:46 UTC (permalink / raw) To: J.D. Bakker; +Cc: linux-ext4 On Fri, Mar 27, 2009 at 09:41:21PM +0100, J.D. Bakker wrote: > Hi all, > > My 4TB ext4 RAID-6 has been damaged again. Symptoms leading up to it > were very similar to the last time (see > http://article.gmane.org/gmane.comp.file-systems.ext4/11418 ): a process > attempted to delete a large (~2GB) file, resulting in a soft lockup with > the following call trace: > > [<ffffffff80526dd7>] ? _spin_lock+0x16/0x19 > [<ffffffff80317b49>] ? ext4_mb_init_cache+0x81c/0xa58 > [<ffffffff80281249>] ? __lru_cache_add+0x8e/0xb6 > [<ffffffff80279d37>] ? find_or_create_page+0x62/0x88 > [<ffffffff80317ec2>] ? ext4_mb_load_buddy+0x13d/0x326 > [<ffffffff80318385>] ? ext4_mb_free_blocks+0x2da/0x75e Thanks, we've been trying to track this down. The hint that you were trying to delete a large (~2 GB) file may be what I need to reproduce it locally. If it happens again, could you try doing this: echo w > /proc/sysrq-trigger dmesg > /tmp/dmesg.txt And send the output of dmesg.txt to us? > Kernel is 2.6.29-rc6. Machine is still responsive to anything that > doesn't touch the ext4 file system, but fails to halt. Upon power > cycling fsck fails with: > > newraidfs: Superblock has an invalid ext3 journal (inode 8). > CLEARED. > *** ext3 journal has been deleted - filesystem is now ext2 only *** > > newraidfs: Note: if several inode or block bitmap blocks or part > of the inode table require relocation, you may wish to try > running e2fsck with the '-b 32768' option first. The problem > may lie only with the primary block group descriptors, and > the backup block group descriptors may be OK. > > newraidfs: Block bitmap for group 0 is not in group. (block 3273617603) It's rather disturbing that there was this much damage done from what looks like a deadlock condition. Others who have report this soft lockup condition haven't reported this kind of filesystem damage. I wonder if it might be caused by power-cycling the box; if possible, I do recommend that people use the reset button rather than power cycling the box; it tends to be much safer and gentler on the machine. > e2fsck 1.41.4 (27-Jan-2009) > ./e2fsck/e2fsck: Group descriptors look bad... trying backup blocks... > Block bitmap for group 0 is not in group. (block 3273617603) > Relocate? no > Inode bitmap for group 0 is not in group. (block 3067860682) > Relocate? no > Inode table for group 0 is not in group. (block 3051956899) > WARNING: SEVERE DATA LOSS POSSIBLE. I really don't know how to explain the fact that your primary and backup superblocks are getting corrupted. This is a real puzzler for me. As I think I've told you before, the kernel simply doesn't know how write to the backup superblocks. > - is there a way to recover my file system? I do have backups of most > data,but as my remote weeklies run on Saturdays I'd still lose a lot of > work Well, probably the best bet at this point is to use "mke2fs -S"; see the man pages for more details. You need to make sure you give exactly the same arguments to mke2fs that you used when you first created the filesystem. The mke2fs.conf also needs to be exactly the same as when the filesystem was originally created. Given that your system seems to have this prediction to wipe out the first part of your block group descriptors, what I would recommend is backing up your block group descriptors like this: dd if=/dev/XXXX of=backup-bg.img bs=4k count=234 This will backup just your block group descriptors, and will allow you to restore them later (although you will have to run e2fsck restoring them). The bigger question is how 16 4k blocks between block numbers 1 and 17 are getting overwritten by garbage. As I mentioned, I haven't seen anything like this except from your system. Some others have reported a soft lockup when doing an "rm -rf" of a large hierarchy, but they haven't reported this kind of filesystem corruption. I haven't been able to replicate it yet myself. > - is ext4 on software raid-6 on x86_64 considered production stable? I > have been getting these hangs almost monthly, which is a lot worse than > my old ext3 software RAID. Well, the softlockup bug you're seeing is a real one. A lot of people aren't seeing it, but you clearly are seeing it, and so we need to track it down. I guess by definition, the fact that you're seeing this bug means it's not "production stable". On the other hand, a lot of poeple have been using ext4 without seeing this bug, some of them in production situations. The criteria for "production stable" is a little grey; certainly no enterprise distribution is calling ext4 "production stable" yet, although it's been released as a technology preview by some distro's. The problem is that a lot of these problems can only be found when it starts getting tested by a large userbase, so this kind of early testing is critical. That being said, I don't want to see early testers losing data, since that tends to scare them off from providing the testing that we so critically need. Hence my suggestion of using dd to backup the block group descriptor blocks. And if you're not willing to take the risk, I'll completely understand your deciding that you need to switch back to ext3. But if you are willing to continue testing, and helping us find the root cause of the problem, we will be very grateful. Best regards, - Ted P.S. You were using a completely stock kernel, correct? No other patches installed? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-27 22:46 ` Theodore Tso @ 2009-03-27 23:47 ` J.D. Bakker 2009-03-28 4:06 ` Theodore Tso 2009-03-28 12:30 ` Theodore Tso 0 siblings, 2 replies; 9+ messages in thread From: J.D. Bakker @ 2009-03-27 23:47 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 At 18:46 -0400 27-03-2009, Theodore Tso wrote: >Thanks, we've been trying to track this down. The hint that you were >trying to delete a large (~2 GB) file may be what I need to reproduce >it locally. For the record, I've had it (=the soft lockup) happen to me six times now since I built the box last December. Three times the machine would reboot/fsck without any issues, one time it had errors which were fixable with fsck -y, and this is the second time I've gotten "WARNING: SEVERE DATA LOSS POSSIBLE". Kernels involved were 2.6.28, 2.6.28.4 and 2.6.29-rc6, no patches, almost identical .config (ie: upgraded through 'make oldconfig'). All six times the process experiencing the lockup was trying to delete a file no smaller than 700MB which had just been read from. This time the process was mythtranscode (which had obviously just read the entire file), the last time it was an rm on a movie I'd just finished watching. >If it happens again, could you try doing this: > > echo w > /proc/sysrq-trigger > dmesg > /tmp/dmesg.txt > >And send the output of dmesg.txt to us? Will do. >It's rather disturbing that there was this much damage done from what >looks like a deadlock condition. Others who have report this soft >lockup condition haven't reported this kind of filesystem damage. I >wonder if it might be caused by power-cycling the box; if possible, I >do recommend that people use the reset button rather than power >cycling the box; it tends to be much safer and gentler on the machine. ACK. I have this nagging feeling that this time the damage was more extensive because I waited only a few minutes before power cycling; my last soft lockup was last Tuesday, and then I waited about half an hour before reaching for the power button. [I have gotten into the habit of power cycling vs resets, as my two ivtv TV grabber cards sometimes fail to re-init their firmware after anything other than a cold boot following a minute of power-off] >Given that your system seems to have this prediction to wipe out the >first part of your block group descriptors, what I would recommend is >backing up your block group descriptors like this: > > dd if=/dev/XXXX of=backup-bg.img bs=4k count=234 > >This will backup just your block group descriptors, and will allow you >to restore them later (although you will have to run e2fsck restoring >them). Thanks, will add that to my nightly backup. The last sentence should read "...run e2fsck *after* restoring them", right? >The bigger question is how 16 4k blocks between block numbers 1 and 17 >are getting overwritten by garbage. As I mentioned, I haven't seen >anything like this except from your system. Some others have reported >a soft lockup when doing an "rm -rf" of a large hierarchy, but they >haven't reported this kind of filesystem corruption. I haven't been >able to replicate it yet myself. I have a few suspects, but no hard evidence beyond that. Two of the six drives in my (linux) software RAID-6 hang off a Marvell SATA/SAS RAID controller. Support for that chip (mvsas) is very recent, and I'll Google around to see if the BIOS has a habit of scribbling over data blocks. I pretty much never reboot the machine other than to get out of hangs, so it's not impossible that the soft lockups are a red herring. As I mentioned before I am running the (closed) NVidia X drivers, but during none of the hangs have I done anything more challenging than watching xterms under fvwm. Other than that the entire setup (CPU, MB et al) is reasonably bleeding edge, but I don't see why that should manifest itself in this particular way (as opposed to, say, video glitches or compiler SIG11s). >And if you're not willing to take the risk, I'll completely understand >your deciding that you need to switch back to ext3. But if you are >willing to continue testing, and helping us find the root cause >of the problem, we will be very grateful. I'd prefer to stay with ext4, as its benefits make sense for the simulations I'm running. The downside is that this is my main home/office server and MythTV backend; not only is restoring from backup tedious, but I'll also have to explain to my SO that the RAID ate her shows. >P.S. You were using a completely stock kernel, correct? No other >patches installed? Yes. Thanks, JDB. [off to read up on mke2fs -S] -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-27 23:47 ` J.D. Bakker @ 2009-03-28 4:06 ` Theodore Tso 2009-03-28 12:30 ` Theodore Tso 1 sibling, 0 replies; 9+ messages in thread From: Theodore Tso @ 2009-03-28 4:06 UTC (permalink / raw) To: J.D. Bakker; +Cc: linux-ext4 This patch *might* solve your problem. I can't be sure because I haven't been able to reproduce the soft lockup problem when rm'ing a file yet. But if it's happening fairly often, it might be worth a try; it definitely fixes a real bug in ext4 --- I'm just not sure it's *your* bug. :-) - Ted commit 73cda61b58a060b6691791a44c01c16155617451 Author: Theodore Ts'o <tytso@mit.edu> Date: Fri Mar 27 19:43:21 2009 -0400 ext4: fix locking typo in mballoc which could cause soft lockup hangs Smatch (http://repo.or.cz/w/smatch.git/) complains about the locking in ext4_mb_add_n_trim() from fs/ext4/mballoc.c 4438 list_for_each_entry_rcu(tmp_pa, &lg->lg_prealloc_list[order], 4439 pa_inode_list) { 4440 spin_lock(&tmp_pa->pa_lock); 4441 if (tmp_pa->pa_deleted) { 4442 spin_unlock(&pa->pa_lock); 4443 continue; 4444 } Brown paper bag time... Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 4f2f476..12d1081 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4389,7 +4389,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac) pa_inode_list) { spin_lock(&tmp_pa->pa_lock); if (tmp_pa->pa_deleted) { - spin_unlock(&pa->pa_lock); + spin_unlock(&tmp_pa->pa_lock); continue; } if (!added && pa->pa_free < tmp_pa->pa_free) { ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-27 23:47 ` J.D. Bakker 2009-03-28 4:06 ` Theodore Tso @ 2009-03-28 12:30 ` Theodore Tso 2009-03-28 12:53 ` J.D. Bakker 1 sibling, 1 reply; 9+ messages in thread From: Theodore Tso @ 2009-03-28 12:30 UTC (permalink / raw) To: J.D. Bakker; +Cc: linux-ext4 On Sat, Mar 28, 2009 at 12:47:19AM +0100, J.D. Bakker wrote: > >> P.S. You were using a completely stock kernel, correct? No other >> patches installed? > > Yes. > Hi J.D., Can you verify exactly what kernel version you are using? This issue is being discussed at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/330824 ... and one person has said it was fixed in 2.6.28-rc8, and at least one, possibly two people have reported that it has been fixed in 2.6.29. I think you're running some version of 2.6.28 or 2.6.28.y, correct? If these report is correct, the problem may be fixed already in mainline, but I'd like to figure out *which* patch made the problem go away, so we can get it backported into various distribution kernels and into the 2.6.28.* and and possibly the 2.6.27.* stable series. Thanks, - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-28 12:30 ` Theodore Tso @ 2009-03-28 12:53 ` J.D. Bakker 2009-03-28 13:09 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: J.D. Bakker @ 2009-03-28 12:53 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 At 08:30 -0400 28-03-2009, Theodore Tso wrote: >Can you verify exactly what kernel version you are using? This issue >is being discussed at: > >https://bugs.launchpad.net/ubuntu/+source/linux/+bug/330824 > >... and one person has said it was fixed in 2.6.28-rc8, and at least >one, possibly two people have reported that it has been fixed in >2.6.29. I think you're running some version of 2.6.28 or 2.6.28.y, >correct? Had the problem with 2.6.28, 2.6.28.4 and 2.6.29-rc6. After yesterday's crash I've upgraded to 2.6.29. For completeness' sake, my ext4 RAID had been created from scratch, not upgraded from ext3. Should I still apply the oneliner you posted yesterday on 2.6.29? In the meantime I've tried mkfs -S, this complained about "File exists while trying to create journal". fsck -y is running (has been for a few hours) and appears to cycle through Group xx inode table at yy conflicts with some other fs block. Relocate? [repeated enough times to overflow my xterm's scrollback buffer] Root inode is not a directory. Clear? We'll see what I can fish out of the lost+found once it's done. JDB. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-28 12:53 ` J.D. Bakker @ 2009-03-28 13:09 ` Theodore Tso 2009-03-29 22:01 ` J.D. Bakker 0 siblings, 1 reply; 9+ messages in thread From: Theodore Tso @ 2009-03-28 13:09 UTC (permalink / raw) To: J.D. Bakker; +Cc: linux-ext4 On Sat, Mar 28, 2009 at 01:53:35PM +0100, J.D. Bakker wrote: > At 08:30 -0400 28-03-2009, Theodore Tso wrote: >> Can you verify exactly what kernel version you are using? This issue >> is being discussed at: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/330824 >>... > > Had the problem with 2.6.28, 2.6.28.4 and 2.6.29-rc6. After yesterday's > crash I've upgraded to 2.6.29. For completeness' sake, my ext4 RAID had > been created from scratch, not upgraded from ext3. Thanks, I've updated your information on the Launchpad bug site. If we can get indpendent confirmation, it appears the bug was fixed sometime between 2.6.28-rc6 and 2.6.28-rc8. > Should I still apply the oneliner you posted yesterday on 2.6.29? Yeah, it's definitely a god fix to have in hand. > In the meantime I've tried mkfs -S, this complained about "File exists > while trying to create journal". fsck -y is running (has been for a few > hours) and appears to cycle through Oops, I need to fix mke2fs to handle this case better. (I assume you were doing "mke2fs -S -t ext4 /dev/XXX", or something like that, right?) You should be able to work around the "File exists..." error via this command: debugfs -w /dev/XXXX -R "clri <8>" ... and then retrying the mke2fs -S command. Regards, - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-28 13:09 ` Theodore Tso @ 2009-03-29 22:01 ` J.D. Bakker 2009-03-31 12:42 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: J.D. Bakker @ 2009-03-29 22:01 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 At 09:09 -0400 28-03-2009, Theodore Tso wrote: >On Sat, Mar 28, 2009 at 01:53:35PM +0100, J.D. Bakker wrote: > > In the meantime I've tried mkfs -S, this complained about "File exists > > while trying to create journal". fsck -y is running (has been for a few > > hours) and appears to cycle through > >You should be able to work around the "File exists..." error via this >command: > > debugfs -w /dev/XXXX -R "clri <8>" > >... and then retrying the mke2fs -S command. Tried that, gave somewhat unexpected results. I cancelled the running fsck, and issued 'debugfs -w /dev/md0 -R "clri <8>"'. This appeared to work, but when I retried the mkfs -S, I still got the "File exists while trying to create journal " error. I re-issued the debugfs command, which then failed with debugfs 1.41.4 (27-Jan-2009) /dev/md0: Bad magic number in super-block while opening filesystem I have restarted the fsck (e2fsck -yv /dev/md0), but it appears to be stuck in a loop: e2fsck 1.41.4 (27-Jan-2009) ./e2fsck/e2fsck: Superblock invalid, trying backup blocks... Group descriptor 1 checksum is invalid. Fix? yes Group descriptor 2 checksum is invalid. Fix? yes [...] Group descriptor 27775 checksum is invalid. Fix? yes Group descriptor 27941 checksum is invalid. Fix? yes newraidfs contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Group 859's inode table at 3080346 conflicts with some other fs block. Relocate? yes Group 860's block bitmap at 33161701 conflicts with some other fs block. Relocate? yes [...] Group 25840's inode table at 846725656 conflicts with some other fs block. Relocate? yes Group 25840's inode table at 846725657 conflicts with some other fs block. Relocate? yes Root inode is not a directory. Clear? yes [no output for a few minutes] Error allocating 1 contiguous block(s) in block group 175 for block bitmap: Could not allocate block in ext2 filesystem Error allocating 512 contiguous block(s) in block group 175 for inode table: Could not allocate block in ext2 filesystem Error allocating 1 contiguous block(s) in block group 769 for inode bitmap: Could not allocate block in ext2 filesystem [...] Error allocating 512 contiguous block(s) in block group 16353 for inode table: Could not allocate block in ext2 filesystem Error allocating 512 contiguous block(s) in block group 25840 for inode table: Could not allocate block in ext2 filesystem Restarting e2fsck from the beginning... ./e2fsck/e2fsck: Group descriptors look bad... trying backup blocks... Group descriptor 1 checksum is invalid. Fix? yes ...and it starts all over again. I had left it running overnight; in the morning it had produced the exact same output 97 times. Over those runs the e2fsck process grew from a few hundred MB to 3GB (all of the RAM installed in the machine), and had pushed all other processes out to swap. Full log file is available at http://lartmaker.nl/ext4/e2fsck-md0-20090327-yv-2.txt . I have since killed e2fsck in the belief that if 97 passes weren't going to do it, number 98 would be unlikely to help much. Is there anything else I can do? Before the crash the fs was ~66% full, so I'm not sure why e2fsck fails to allocate blocks. Thanks, JDB. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Once more: Recovering a damaged ext4 fs? 2009-03-29 22:01 ` J.D. Bakker @ 2009-03-31 12:42 ` Theodore Tso 0 siblings, 0 replies; 9+ messages in thread From: Theodore Tso @ 2009-03-31 12:42 UTC (permalink / raw) To: J.D. Bakker; +Cc: linux-ext4 OK, here's a patch that should allow mke2fs -S to work. It should be applied against the e2fsprogs 1.41.4. Sorry for the delay in getting this to you; things have been crazy busy on my end. - Ted commit a620baddee647faf42c49ee2e04ee3f667149d68 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Mar 31 07:42:24 2009 -0400 mke2fs: Don't try to create the journal in super-only mode Since we aren't initializing the inode table, creating the journal will just fail. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/misc/mke2fs.c b/misc/mke2fs.c index e69e5ce..4f50ffa 100644 --- a/misc/mke2fs.c +++ b/misc/mke2fs.c @@ -2079,6 +2079,12 @@ int main (int argc, char *argv[]) EXT3_FEATURE_COMPAT_HAS_JOURNAL)) { journal_blocks = figure_journal_size(journal_size, fs); + if (super_only) { + printf(_("Skipping journal creation in super-only mode\n")); + fs->super->s_journal_inum = EXT2_JOURNAL_INO; + goto no_journal; + } + if (!journal_blocks) { fs->super->s_feature_compat &= ~EXT3_FEATURE_COMPAT_HAS_JOURNAL; ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-31 12:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-27 20:41 Once more: Recovering a damaged ext4 fs? J.D. Bakker 2009-03-27 22:46 ` Theodore Tso 2009-03-27 23:47 ` J.D. Bakker 2009-03-28 4:06 ` Theodore Tso 2009-03-28 12:30 ` Theodore Tso 2009-03-28 12:53 ` J.D. Bakker 2009-03-28 13:09 ` Theodore Tso 2009-03-29 22:01 ` J.D. Bakker 2009-03-31 12:42 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).