Re: fsck infinite loop on corrupt ext4 file system

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Theodore Tso <tytso@mit.edu>
To: Frank Mayhar <fmayhar@google.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsck infinite loop on corrupt ext4 file system
Date: Tue, 18 Aug 2009 12:01:55 -0400	[thread overview]
Message-ID: <20090818160155.GC28560@mit.edu> (raw)
In-Reply-To: <1250557822.23227.9.camel@bobble.smo.corp.google.com>

On Mon, Aug 17, 2009 at 06:10:22PM -0700, Frank Mayhar wrote:
> It's clear that fsck is neither correcting the block groups nor is it
> detecting the bad entries properly (a sanity check might be in order
> here).  It's not even noticing that it's looping, it just keeps failing
> the allocation and retrying.  While it may be that fsck can't recover
> the file system in this case, it should at least notice and abort.
> 
> My thinking is that the location of the inode tables should be invariant
> over the life of the file system.  Certainly there's no place in ext4
> itself that changes those fields (that I can see, anyway).  Why couldn't
> fsck compute the proper values and compare those against what's there?

So there are a couple of things going on here.  The first is that the
code which tries to allocate new inode/block allocation bitmaps or
inode tables wasn't taught that filesystems with the FLEX_BG feature
should have the metadata located at the beginning of the
flex-blockgroup, but if we can't find space for it there (allocating
the inode table is tricky since it requires possibly up to a few
hundred contiguous free blocks), we should try to find the space
anywhere in the filesystem.  If it can't find the space, we should
indeed abort.  Please find attached a patch which should fix e2fsck to
handle this case correctly.  Could you test it and let me know if it
works correctly?

As far as assuming the inode tables are invariant over the life of the
filesystem --- this is normally true, but inode tables can be located
in places other than the default; for example if bad blocks located
where the inode tables should be, then the inode tables can be pushed
to non-standard locations.  So this makes calculating where the inode
table "should" be a little tricky, especially since the contents of
the bad blocks can change after the filesystem is formatted.

In addition, e2fsck tries very hard not to destroy data, and so there
is the question of what to do if there are data blocks located where
the inode table "should" be.  In theory e2fsck should be able to move
the inode data blocks elsewhere, or if there is no space, potentially
the offer to delete a user file to make room for the inode table ---
after all, better sacrifice one or two data files rather than lose
potentially several hundred or thousand files.  But this is a level of
complexity that I never had a chance to add to e2fsck, and in truth
the case where we run into this level of lossage is very rare.

After all, most of the time we have so many copies of the block group
descriptors, and the backup group descripts are rarely written, so
most of the time this level of corruption should be quite rare.
Making e2fsck smarter to deal with the most extreme cases of loss is
therefore desirable, but it's always been a "nice to have".

In any case, with ext4 and the flex_bg feature, the ability to
allocate the inode table anywhere in the filesystem should make the
case where the really complex recovery code even more rarely required.

Please try this patch and see if it fixes things up for you or not.

Thanks!!

						- Ted

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 518c2ff..203468b 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -2376,9 +2376,10 @@ static void new_table_block(e2fsck_t ctx, blk_t first_block, int group,
 			    const char *name, int num, blk_t *new_block)
 {
 	ext2_filsys fs = ctx->fs;
+	dgrp_t		last_grp;
 	blk_t		old_block = *new_block;
 	blk_t		last_block;
-	int		i;
+	int		i, is_flexbg, flexbg, flexbg_size;
 	char		*buf;
 	struct problem_context	pctx;

@@ -2388,19 +2389,44 @@ static void new_table_block(e2fsck_t ctx, blk_t first_block, int group,
 	pctx.blk = old_block;
 	pctx.str = name;

-	last_block = ext2fs_group_last_block(fs, group);
+	/*
+	 * For flex_bg filesystems, first try to allocate the metadata
+	 * within the flex_bg, and if that fails then try finding the
+	 * space anywhere in the filesystem.
+	 */
+	is_flexbg = EXT2_HAS_INCOMPAT_FEATURE(fs->super,
+					      EXT4_FEATURE_INCOMPAT_FLEX_BG);
+	if (is_flexbg) {
+		flexbg_size = 1 << fs->super->s_log_groups_per_flex;
+		flexbg = group / flexbg_size;
+		first_block = ext2fs_group_first_block(fs,
+						       flexbg_size * flexbg);
+		last_grp = group | (flexbg_size - 1);
+		if (last_grp > fs->group_desc_count)
+			last_grp = fs->group_desc_count;
+		last_block = ext2fs_group_last_block(fs, last_grp);
+	} else
+		last_block = ext2fs_group_last_block(fs, group);
 	pctx.errcode = ext2fs_get_free_blocks(fs, first_block, last_block,
-					num, ctx->block_found_map, new_block);
+					      num, ctx->block_found_map,
+					      new_block);
+	if (is_flexbg && (pctx.errcode = EXT2_ET_BLOCK_ALLOC_FAIL))
+		pctx.errcode = ext2fs_get_free_blocks(fs,
+				fs->super->s_first_data_block,
+				fs->super->s_blocks_count,
+				num, ctx->block_found_map, new_block);
 	if (pctx.errcode) {
 		pctx.num = num;
 		fix_problem(ctx, PR_1_RELOC_BLOCK_ALLOCATE, &pctx);
 		ext2fs_unmark_valid(fs);
+		ctx->flags |= E2F_FLAG_ABORT;
 		return;
 	}
 	pctx.errcode = ext2fs_get_mem(fs->blocksize, &buf);
 	if (pctx.errcode) {
 		fix_problem(ctx, PR_1_RELOC_MEMORY_ALLOCATE, &pctx);
 		ext2fs_unmark_valid(fs);
+		ctx->flags |= E2F_FLAG_ABORT;
 		return;
 	}
 	ext2fs_mark_super_dirty(fs);

next prev parent reply	other threads:[~2009-08-18 16:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-14 23:55 fsck infinite loop on corrupt ext4 file system Frank Mayhar
2009-08-18  1:10 ` Frank Mayhar
2009-08-18  2:47   ` Andreas Dilger
2009-08-18 16:01   ` Theodore Tso [this message]
2009-08-18 16:31     ` Frank Mayhar
2009-08-18 17:03       ` Theodore Tso
2009-08-18 19:03         ` Andreas Dilger

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:518c2ff dfblob:203468b )
 OR (
bs:"Re: fsck infinite loop on corrupt ext4 file system" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090818160155.GC28560@mit.edu \
    --to=tytso@mit.edu \
    --cc=fmayhar@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).