public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Eric Sandeen <sandeen@redhat.com>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH/RFC] - make ext3 more robust in the face of filesystem corruption
Date: Wed, 18 Oct 2006 15:40:22 -0600	[thread overview]
Message-ID: <20061018214022.GJ3509@schatzie.adilger.int> (raw)
In-Reply-To: <45369869.60400@redhat.com>

On Oct 18, 2006  16:11 -0500, Eric Sandeen wrote:
> First, we had a corrupted index directory that was never checked
> for consistency... it was corrupt, and pointed to another "entry"
> of length 0.  The for() loop looped forever, since the length
> of ext3_next_entry(de) was 0, and we kept looking at the same
> pointer over and over and over and over... I modeled this check
> and subsequent action on what is done for non-index directories
> in ext3_readdir... but I also see a few places where this check
> is deemed "too expensive" - any thoughts?

Hmm, in 2.6 ext2 this is handled somewhat differently - one of the main
places where ext2 and ext3 differ.  The directory leaf data is kept in
the page cache and there is a helper function ext2_check_page() to mark
the page "checked".  That means the page only needs to be checked once
after being read from disk, instead of each time through readdir.

That said, making ext3 run in that manner is major surgery, unlike your
fix.  I've seen such errors in production so it is worthwhile to fix this.
It might be possible to have a helper function similar to ext3_bread()
when reading directory leaf blocks that checks only if !buffer_uptodate()?

> Index: linux-2.6.18/fs/ext3/namei.c
> ===================================================================
> --- linux-2.6.18.orig/fs/ext3/namei.c
> +++ linux-2.6.18/fs/ext3/namei.c
> @@ -551,6 +551,15 @@ static int htree_dirblock_to_tree(struct
>  					   dir->i_sb->s_blocksize -
>  					   EXT3_DIR_REC_LEN(0));
>  	for (; de < top; de = ext3_next_entry(de)) {
> +		if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
> +					(block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb))
> +						+((char *)de - bh->b_data))) {
> +			/* On error, skip the f_pos to the next block. */
> +			dir_file->f_pos = (dir_file->f_pos | 
> +					(dir_file->i_sb->s_blocksize - 1)) + 1;
> +			brelse (bh);
> +			return count;
> +		}
>  		ext3fs_dirhash(de->name, de->name_len, hinfo);
>  		if ((hinfo->hash < start_hash) ||
>  		    ((hinfo->hash == start_hash) &&
> 
> Next we had a root directory inode which had a corrupted size, claimed
> to be > 200M on a 4M filesystem.  ext3_get_blocks_handle() was returning 0,
> meaning that lookup failed.  (there was only really 1 block in the directory, 
> but because the size was so large, readdir kept coming back for more...)
>
> instead of catching the no-block-at-this-offset error, we fell into the
> !bh case, which assumed that there had been an IO error, and kept on trying
> 200M+ of blocks that didn't exist.  I -think- it makes more sense to realize
> that if ext3_get_blocks_handle returns 0, there is a hole at this location,
> (as described by the on-disk metadata) and something has gone wrong.
> 
> Index: linux-2.6.18/fs/ext3/dir.c
> ===================================================================
> --- linux-2.6.18.orig/fs/ext3/dir.c
> +++ linux-2.6.18/fs/ext3/dir.c
> @@ -141,6 +141,11 @@ static int ext3_readdir(struct file * fi
>  					(PAGE_CACHE_SHIFT - inode->i_blkbits),
>  				1);
>  			bh = ext3_bread(NULL, inode, blk, 0, &err);
> +		} else {
> +			ext3_error(sb, "ext3_readdir",
> +			 "directory #%lu block %lu lookup failed, corrupt dir",
> +			 	inode->i_ino, blk);
> +                        return -EINVAL;
>  		}
>  
>  		/*
> 
> I'm not so sure about this one, though - seems like maybe also it should test
> for an actual error case (< 0) from ext3_get_blocks_handle as well.

I'm not sure whether this is a win or not.  It means that if there is ever
a directory with a bad leaf block any entries beyond that block are not
accessible anymore.  The existing !bh case already marks the filesystem in
error.  Maybe as a special case we can check in "if (!bh)" if i_size and
i_blocks make sense.  Something like:

		if (!bh) {
			:
			:
+			if (filp->f_pos > inode->i_blocks << 9) {
+				break;
			filp->f_pos += sb->s_blocksize - offset;
			continue;
		}

This obviously won't help if the whole inode is bogus, but then nothing
will catch all errors.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

  reply	other threads:[~2006-10-18 21:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-18 21:11 [PATCH/RFC] - make ext3 more robust in the face of filesystem corruption Eric Sandeen
2006-10-18 21:40 ` Andreas Dilger [this message]
2006-10-18 21:56   ` Eric Sandeen
2006-10-18 22:24     ` Andreas Dilger
2006-10-19  0:26       ` Eric Sandeen
2006-10-19  7:35         ` Andreas Dilger
2006-10-19 16:04       ` Eric Sandeen
2006-10-19 22:43         ` Eric Sandeen
2006-10-20  3:50           ` Andreas Dilger
2006-10-20  4:00             ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061018214022.GJ3509@schatzie.adilger.int \
    --to=adilger@clusterfs.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox