From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 07/18] e2fsck: verify checksums after checking everything else
Date: Mon, 28 Jul 2014 01:27:43 -0700 [thread overview]
Message-ID: <20140728082743.GN8628@birch.djwong.org> (raw)
In-Reply-To: <20140726205316.GO6725@thunk.org>
On Sat, Jul 26, 2014 at 04:53:16PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 25, 2014 at 05:34:22PM -0700, Darrick J. Wong wrote:
> > There's a particular problem with e2fsck's user interface where
> > checksum errors are concerned: Fixing the first complaint about
> > a checksum problem results in the inode being cleared even if e2fsck
> > could otherwise have recovered it. While this mode is useful for
> > cleaning the remaining broken crud off the filesystem, we could at
> > least default to checking everything /else/ and only complaining about
> > the incorrect checksum if fsck finds nothing else wrong.
> >
> > So, plumb in a config option. We default to "verify and checksum"
> > unless the user tell us otherwise.
>
> I'm not convinced this is the right way to go. Telling the user that
> they need to muck with the config file depending on what sort of file
> system corruption they have seems rather unsatisfying.
>
> This is what I'd much rather do. Add a "sanity checking" mode to the
> inode scanning functions which gets enabled when EXT2_SF_SANITY_CHECK
> is set via ext2fs_inode_scan_flags(). What the sanity check mode does
> is every time the inode scan functions read in a new inode table
> block, it performs a "sanity check" on the inode table block.
>
> The sanity check is carried out as follows. If a majority of the
> inodes in the inode table block are "insane" then set the
> EXT2_SF_INSANE_ITABLE_BLOCK flag in scan flags, if not, clear this
> flag. If checksum is incorrect, the inode is considered insane. If
> the extent flag is set, and the extent header looks insane, then the
> inode is considered insane. For indirect blocks, if more than 50% of
> the blocks in i_blocks[] are invalid, then inode is considered insane.
>
> This is basically a simiplified version of an algorithm which Andreas
> has been carrying in Lustre's e2fsprogs for a while, which tries to
> apply a hueristic check over multiple inodes to decide whether if we
> would be better off just zapping all of the inodes in an inode table
> block. The reason why I never integrated that change into mainline is
> that in order to make it work, it violated a large number of
> abstractions, and so I considered too ugly to live.
>
> The advantage of doing this all inside lib/ext2fs/inode.c's inode
> scanning function is that it's much cleaner. We can't do as many
> checks as Andreas did, but for the rough hueristic of deciding whether
> we have a minor problem in a single inode, or a massive problem caused
> by garbage written into the inode table or another inode table block
> getting written into the wrong place on disk (which we can only do if
> metadata checksums are enabled, but that's OK), we can get away with
> doing only the obvious "local" checks.
>
> After all, in practice, it's usually either problems in a single inode
> (usually caused by a kernel bug or a memory bit flip), or complete
> garbage written into the inode table block, or an inode table block
> written to wrong place on disk, on top of another inode table block.
> So we just need a rough hueristic to distinguish between these cases.
>
> Once we've decided whether the entire inode table block is insane or
> not, then what we do is if an inode has any problems at all during the
> pass1 scan, we check to see if the inode table block is marked insane.
> If it is considered insane, then we just clear the i_links_count and
> set dtime, effectively zapping the inode, no questions asked.
> Otherwise, we proceed doing the individual fix ups of each inode field.
>
> Does that make sense?
Yes, that makes sense for dealing with the inodes. What about the other FS
object blocks, such as directories, EAs, and extents?
Perhaps I'll try to define some insane heuristics:
For EA and extent blocks we could declare the block insane if the checksum
fails and the magic number is missing. Seems pretty straightforward.
For classic directory blocks, we could declare the block insane if the checksum
fails and the end of the block is not {00 00 00 00 0C 00 00 DE XX XX XX XX}.
For htree directory blocks, we could similarly declare insanity if the checksum
fails and the beginning of the block are not the required fake dir entries.
...and if it's insane, zap it immediately; otherwise, run the usual checks and
fix the checksum if the other checks pass.
Hmm, that doesn't seem so bad. What do people think?
--D
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-07-28 8:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-26 0:33 [PATCH 00/18] e2fsprogs patchbomb 7/14, part 2 Darrick J. Wong
2014-07-26 0:33 ` [PATCH 01/18] e2fsck: reserve blocks for root/lost+found directory repair Darrick J. Wong
2014-07-26 19:47 ` Theodore Ts'o
2014-07-28 7:27 ` Darrick J. Wong
2014-07-26 0:33 ` [PATCH 02/18] e2fsck: fix merge error in "clear uninit flag on directory extents" Darrick J. Wong
2014-07-26 20:04 ` Theodore Ts'o
2014-07-26 0:33 ` [PATCH 03/18] e2fsck: perform implied cluster allocations when filling a directory hole Darrick J. Wong
2014-07-26 20:08 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 04/18] e2fsck: fix rule-violating lblk->pblk mappings on bigalloc filesystems Darrick J. Wong
2014-07-26 6:02 ` Andreas Dilger
2014-07-26 20:27 ` Theodore Ts'o
2014-07-28 8:28 ` Darrick J. Wong
2014-07-28 17:55 ` Darrick J. Wong
2014-07-28 19:32 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 05/18] e2fsck: during pass1b delete_file, only free a cluster once Darrick J. Wong
2014-07-26 20:30 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 06/18] dumpe2fs: add switch to disable checksum verification Darrick J. Wong
2014-07-26 20:58 ` Theodore Ts'o
2014-07-28 7:48 ` Darrick J. Wong
2014-07-26 0:34 ` [PATCH 07/18] e2fsck: verify checksums after checking everything else Darrick J. Wong
2014-07-26 20:53 ` Theodore Ts'o
2014-07-28 8:27 ` Darrick J. Wong [this message]
2014-07-26 0:34 ` [PATCH 08/18] e2fsck: fix the various checksum error messages Darrick J. Wong
2014-07-26 21:09 ` Theodore Ts'o
2014-07-28 7:57 ` Darrick J. Wong
2014-07-26 0:34 ` [PATCH 09/18] e2fsck: insert a missing dirent tail for checksums if possible Darrick J. Wong
2014-07-26 21:13 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 10/18] e2fsck: write dir blocks after new inode when reconstructing root/lost+found Darrick J. Wong
2014-07-26 21:18 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 11/18] libext2/fsck: correctly preserve fs flags when modifying ignore-csum-error flag Darrick J. Wong
2014-07-27 23:27 ` Theodore Ts'o
2014-07-28 8:06 ` Darrick J. Wong
2014-07-26 0:34 ` [PATCH 12/18] e2fsck: toggle checksum verification error reporting appropriately Darrick J. Wong
2014-07-27 23:37 ` Theodore Ts'o
2014-07-28 7:38 ` Darrick J. Wong
2014-07-28 11:41 ` Theodore Ts'o
2014-07-26 0:34 ` [PATCH 13/18] libext2fs: Don't cache inodes that fail checksum verification Darrick J. Wong
2014-07-26 0:35 ` [PATCH 14/18] e2fsck: always recheck an inode checksum failure Darrick J. Wong
2014-07-26 0:35 ` [PATCH 15/18] e2fsck: clear badblocks inode when checksum fails Darrick J. Wong
2014-07-27 23:42 ` Theodore Ts'o
2014-07-26 0:35 ` [PATCH 16/18] e2fsck: leave room for checksum structure when salvaging a directory Darrick J. Wong
2014-07-27 23:45 ` Theodore Ts'o
2014-07-26 0:35 ` [PATCH 17/18] e2fsck: make insert_dirent_tail more robust Darrick J. Wong
2014-07-27 23:48 ` Theodore Ts'o
2014-07-26 0:35 ` [PATCH 18/18] e2fsck: don't offer to fix the checksum of fixed extents Darrick J. Wong
2014-07-27 23:52 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140728082743.GN8628@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).