From: Valerie Henson <val_henson@linux.intel.com>
To: David Chinner <dgc@sgi.com>
Cc: Amit Gud <gud@ksu.edu>, Nikita Danilov <nikita@clusterfs.com>,
David Lang <david.lang@digitalinsight.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
riel@surriel.com, zab@zabbo.net, arjan@infradead.org,
suparna@in.ibm.com, brandon@ifup.org, karunasagark@gmail.com
Subject: Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
Date: Wed, 25 Apr 2007 16:03:44 -0700 [thread overview]
Message-ID: <20070425230344.GC16129@nifty> (raw)
In-Reply-To: <20070425105434.GX32602149@melbourne.sgi.com>
On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote:
> On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote:
> >
> > The structure looks like this:
> >
> > ---------- ----------
> > | cnode 0 |---------->| cnode 0 |----------> to another cnode or NULL
> > ---------- ----------
> > | cnode 1 |----- | cnode 1 |-----
> > ---------- | ---------- |
> > | cnode 2 |-- | | cnode 2 |-- |
> > ---------- | | ---------- | |
> > | cnode 3 | | | | cnode 3 | | |
> > ---------- | | ---------- | |
> > | | | | | |
> >
> > inodes inodes or NULL
>
> How do you recover if fsfuzzer takes out a cnode in the chain? The
> chunk is marked clean, but clearly corrupted and needs fixing and
> you don't know what it was pointing at. Hence you have a pointer to
> a trashed cnode *somewhere* that you need to find and fix, and a
> bunch of orphaned cnodes that nobody points to *somewhere else* in
> the filesystem that you have to find. That's a full scan fsck case,
> isn't?
Excellent question. This is one of the trickier aspects of chunkfs -
the orphan inode problem (tricky, but solvable). The problem is what
if you smash/lose/corrupt an inode in one chunk that has a
continuation inode in another chunk? A back pointer does you no good
if the back pointer is corrupted.
What you do is keep tabs on whether you see damage that looks like
this has occurred - e.g., inode use/free counts wrong, you had to zero
a corrupted inode - and when this happens, you do a scan of all
continuation inodes in chunks that have links to the corrupted chunk.
What you need to make this go fast is (1) a pre-made list of which
chunks have links with which other chunks, (2) a fast way to read all
of the continuation inodes in a chunk (ignoring chunk-local inodes).
This stage is O(fs size) approximately, but it should be quite swift.
> It seems that any sort of damage to the underlying storage (e.g.
> media error, I/O error or user brain explosion) results in the need
> to do a full fsck and hence chunkfs gives you no benefit in this
> case.
I worry about this but so far haven't found something which couldn't
be cut down significantly with just a little extra work. It might be
helpful to look at an extreme case.
Let's say we're incredibly paranoid. We could be justified in running
a full fsck on the entire file system in between every single I/O.
After all, something *might* have been silently corrupted. But this
would be ridiculously slow. We could instead never check the file
system. But then we would end up panicking and corrupting the file
system a lot. So what's a good compromise?
In the chunkfs case, here's my rules of thumb so far:
1. Detection: All metadata has magic numbers and checksums.
2. Scrubbing: Random check of chunks when possible.
3. Repair: When we detect corruption, either by checksum error, file
system code assertion failure, or hardware tells us we have a bug,
check the chunk containing the error and any outside-chunk
information that could be affected by it.
-VAL
next prev parent reply other threads:[~2007-04-25 23:03 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-23 11:21 [RFC][PATCH] ChunkFS: fs fission for faster fsck Amit Gud
[not found] ` <17965.6084 1.900376.524639@gargle.gargle.HOWL>
2007-04-23 16:28 ` Suparna Bhattacharya
2007-04-23 15:25 ` Amit Gud
2007-04-23 16:32 ` Suparna Bhattacharya
2007-04-24 11:44 ` Nikita Danilov
2007-04-24 18:27 ` David Lang
2007-04-24 19:34 ` Nikita Danilov
2007-04-24 19:26 ` David Lang
2007-04-25 11:34 ` Nikita Danilov
2007-04-25 16:39 ` David Lang
2007-04-25 22:47 ` Valerie Henson
2007-04-26 14:14 ` Jeff Dike
2007-04-26 15:53 ` Amit Gud
2007-04-26 16:05 ` Jeff Dike
2007-04-26 16:56 ` Amit Gud
2007-04-27 4:58 ` Valerie Henson
2007-04-27 15:06 ` Jeff Dike
2007-05-01 17:26 ` Valerie Henson
2007-04-26 16:11 ` Alan Cox
2007-04-26 16:44 ` Amit Gud
2007-04-24 21:53 ` Amit Gud
2007-04-25 10:54 ` David Chinner
2007-04-25 11:38 ` Andreas Dilger
2007-04-25 17:52 ` Amit Gud
2007-04-25 23:06 ` Valerie Henson
2007-04-25 23:03 ` Valerie Henson [this message]
2007-04-26 0:47 ` David Chinner
2007-04-26 22:21 ` Jörn Engel
2007-04-26 8:47 ` Jan Kara
2007-04-27 5:07 ` Valerie Henson
2007-04-27 10:53 ` Jörn Engel
2007-04-28 6:50 ` Valerie Henson
2007-04-28 10:03 ` Jörn Engel
2007-04-25 22:43 ` Valerie Henson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070425230344.GC16129@nifty \
--to=val_henson@linux.intel.com \
--cc=arjan@infradead.org \
--cc=brandon@ifup.org \
--cc=david.lang@digitalinsight.com \
--cc=dgc@sgi.com \
--cc=gud@ksu.edu \
--cc=karunasagark@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nikita@clusterfs.com \
--cc=riel@surriel.com \
--cc=suparna@in.ibm.com \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).