linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jörn Engel" <joern@lazybastard.org>
To: Valerie Henson <val_henson@linux.intel.com>
Cc: Matt Mackall <mpm@selenic.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] TileFS - a proposal for scalable integrity checking
Date: Wed, 9 May 2007 12:12:58 +0200	[thread overview]
Message-ID: <20070509101257.GA29978@lazybastard.org> (raw)
In-Reply-To: <20070509055608.GI12859@nifty>

On Tue, 8 May 2007 22:56:09 -0700, Valerie Henson wrote:
> 
> I like it too, especially the rmap stuff, but I don't think it solves
> some of the problems chunkfs solves.  The really nice thing about
> chunkfs is that it tries hard to isolate each chunk from all the other
> chunks.  You can think of regular file systems as an OS one big shared
> address space - any process can potentially modify any other process's
> address space, including the kernel's - and chunkfs as the modern UNIX
> private address space model.  Except in rare worst case models (the
> equivalent of a kernel bug or writing /dev/mem), the only way one
> chunk can affect another chunk is through the narrow little interface
> of the continuation inode.  This severely limits the ability of one
> chunk to corrupt another - the worse you can do is end up with the
> wrong link count on an inode pointed to from another chunk.

This leaves me a bit confused.  Imo a filesystem equivalent of process's
address spaces would be permissions and quotas.  Indeed there is no
guarantee where any address spaces pages may physically reside.  They
can be in any zone, node or even swap or regular files.

Otoh, each physical page does have an rmap of some sorts - enough to
figure out why currently owns this page.  Does your own analogy work
against you?

Back to chunkfs, the really smart idea behind it imo is to take just a
small part of the filesystem, assume that everything else is flawless,
and check the small part under that assumption.  The assumption may be
wrong.  If that wrongness would effect the minimal fsck, it should get
detected as well.  Otherwise it doesn't matter right now.

What I never liked about chunkfs were two things.  First it splits the
filesystem into an array of chunks.  With sufficiently large devices,
either the number or the size of chunks will come close to problematic
again.  Some sort of tree arrangement intuitively makes more sense.

Secondly, the cnodes are... weird, complicated, not well understood, a
hack.  Pick a term.  Avoiding cnodes is harder than avoiding regular
fragmentation and the recent defragment patches seem to imply we're
doing a bad job at that already.  Linked lists of cnodes - yuck.

Not directly a chunkfs problem, but still unfortunate is that it still
cannot detect medium errors.  There are no checksums.  Checksums cost
performance, so they obviously have to be optional at user's choice.
But not even having the option is quite 80's.

Matt's proposal is an alternative solution that can address all of my
concerns.  Instead of cnodes it has the rmap.  That is a very simple
structure I can explain to my nephews.  It allows for checksums, which
is nice as well.  And it does allow for a tree structure of tiles.

Tree structure means that each tile can have free space counters.  A
supertile (or whatever one may call it) can have a free space counter
that is the sum of all member free space counters.  And so forth
upwards.  Same for dirty bits and anything else I've forgotten.

So individual tiles can be significantly smaller than chunks in chunkfs.
Checking them is significantly faster than checking a chunk.  There will
be more dirty tiles at any given time, but a better way to look at it is
to say that for any dirty chunk in chunkfs, tilefs has some dirty and
some clean tiles.  So the overall ratio of dirty space is never higher
and almost always lower.

Overall I almost envy Matt for having this idea.  In hindsight it should
have been obvious to me.  But then again, in hindsight the fsck problem
and using divide and conquer should have been obvious to everyone and
iirc you were the only one who seriously persued the idea and got all
this frenzy started. :)

Jörn

-- 
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2007-05-09 10:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-28 22:05 [RFC] TileFS - a proposal for scalable integrity checking Matt Mackall
2007-04-29 12:21 ` Jörn Engel
2007-04-29 12:57   ` Matt Mackall
2007-04-29 15:47     ` Jörn Engel
2007-05-09  5:56   ` Valerie Henson
2007-05-09 10:12     ` Jörn Engel [this message]
2007-04-29 15:58 ` Jörn Engel
2007-04-29 16:24   ` Matt Mackall
2007-04-29 16:34 ` Andi Kleen
2007-04-29 16:05   ` Jörn Engel
2007-04-29 16:09   ` Matt Mackall
2007-04-29 23:23 ` Theodore Tso
2007-04-30  1:40   ` Matt Mackall
2007-04-30 17:26     ` Theodore Tso
2007-04-30 17:59       ` Matt Mackall
2007-05-02 13:18         ` Jörn Engel
2007-05-02 13:32     ` Jörn Engel
2007-05-02 15:37       ` Matt Mackall
2007-05-02 16:35         ` Jörn Engel
2007-05-09  7:56     ` Valerie Henson
2007-05-09 11:16       ` Nikita Danilov
2007-05-09 18:56         ` Valerie Henson
2007-05-09 19:19           ` Nikita Danilov
2007-05-09 17:06       ` Matt Mackall
2007-05-09 18:59         ` Valerie Henson
2007-05-09 19:51           ` Matt Mackall
2007-05-10  0:03             ` Jörn Engel
2007-05-11  9:46             ` Valerie Henson
2007-05-11 15:55               ` Matt Mackall
2007-05-09 19:01     ` Valerie Henson
2007-05-09 20:05       ` Matt Mackall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070509101257.GA29978@lazybastard.org \
    --to=joern@lazybastard.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=val_henson@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).