From: Matt Mackall <mpm@selenic.com>
To: Valerie Henson <val@nmt.edu>
Cc: Theodore Tso <tytso@mit.edu>, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] TileFS - a proposal for scalable integrity checking
Date: Fri, 11 May 2007 10:55:28 -0500 [thread overview]
Message-ID: <20070511155527.GU11115@waste.org> (raw)
In-Reply-To: <20070511094641.GB27797@rainbow>
On Fri, May 11, 2007 at 03:46:41AM -0600, Valerie Henson wrote:
> On Wed, May 09, 2007 at 02:51:41PM -0500, Matt Mackall wrote:
> >
> > We will, unfortunately, need to be able to check an entire directory
> > at once. There's no other efficient way to assure that there are no
> > duplicate names in a directory, for instance.
>
> I don't see that being a major problem for the vast majority of
> workloads.
>
> > In summary, checking a tile requires trivial checks on all the inodes
> > and directories that point into a tile. Inodes, directories, and data
> > that are inside a tile get checked more thoroughly but still don't
> > need to do much pointer chasing.
>
> Okay, I'm totally convinced - checking a tile-at-a-time works! I'm
> going to steal as many of your ideas as possible and write ChileFS. :)
> Per-block inode rmap in particular has so many advantages that I'm
> ranking it up with checksums as a must-have feature.
>
> Now for the hard part: repair. If you find an indirect block or
> extent with a bad checksum, how much of the file system are you going
> to have to read to fix the dangling blocks? I can see a speed-up by
> reading just the rmaps and looking for the associated inode number.
> What about an inode that has been corrupted? You could at least get
> the inode number out of the rmap, but your pointers to your first
> level indirect blocks are gone. A directory block? No way to get
> useful information out of the rmap there. Ad nauseum. This is where
> I really like having the encapsulation of chunkfs, despite all the
> nasty continuation inode bits.
There are a few interesting possibilities here.
First we've got several new sources of integrity checking. If a block
points to an inode and the inode doesn't point back to it, we follow
the inode's corresponding pointer forward and back. If that turns out
to be consistent, we know that the original block is in the wrong. We
can also check the tile header and inode CRCs if they disagree and
neither pointer checks out.
Failing that, stray blocks can get attached to an inode in lost+found,
and we can reattach them later during a full filesystem sweep.
Another possibility is that we can compare forward and backward
pointers at runtime. This is probably a good idea anyway: we've got to
read tile headers for runtime CRC checks, so might as well check the
pointers too. If we discover a pointer mismatch, we can either orphan
data blocks -or recover orphans- on the fly. So doing a
filesystem-wide backup will also do most of an fsck.
--
Mathematics is the supreme nostalgia of our time.
next prev parent reply other threads:[~2007-05-11 15:55 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-28 22:05 [RFC] TileFS - a proposal for scalable integrity checking Matt Mackall
2007-04-29 12:21 ` Jörn Engel
2007-04-29 12:57 ` Matt Mackall
2007-04-29 15:47 ` Jörn Engel
2007-05-09 5:56 ` Valerie Henson
2007-05-09 10:12 ` Jörn Engel
2007-04-29 15:58 ` Jörn Engel
2007-04-29 16:24 ` Matt Mackall
2007-04-29 16:34 ` Andi Kleen
2007-04-29 16:05 ` Jörn Engel
2007-04-29 16:09 ` Matt Mackall
2007-04-29 23:23 ` Theodore Tso
2007-04-30 1:40 ` Matt Mackall
2007-04-30 17:26 ` Theodore Tso
2007-04-30 17:59 ` Matt Mackall
2007-05-02 13:18 ` Jörn Engel
2007-05-02 13:32 ` Jörn Engel
2007-05-02 15:37 ` Matt Mackall
2007-05-02 16:35 ` Jörn Engel
2007-05-09 7:56 ` Valerie Henson
2007-05-09 11:16 ` Nikita Danilov
2007-05-09 18:56 ` Valerie Henson
2007-05-09 19:19 ` Nikita Danilov
2007-05-09 17:06 ` Matt Mackall
2007-05-09 18:59 ` Valerie Henson
2007-05-09 19:51 ` Matt Mackall
2007-05-10 0:03 ` Jörn Engel
2007-05-11 9:46 ` Valerie Henson
2007-05-11 15:55 ` Matt Mackall [this message]
2007-05-09 19:01 ` Valerie Henson
2007-05-09 20:05 ` Matt Mackall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070511155527.GU11115@waste.org \
--to=mpm@selenic.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=val@nmt.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).