Re: topics for the file system mini-summit

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Valerie Henson <val_henson@linux.intel.com>
To: Ric Wheeler <ric@emc.com>
Cc: linux-fsdevel@vger.kernel.org, Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: topics for the file system mini-summit
Date: Wed, 31 May 2006 19:19:09 -0700	[thread overview]
Message-ID: <20060601021908.GL10420@goober> (raw)
In-Reply-To: <44762552.8000906@emc.com>

On Thu, May 25, 2006 at 02:44:50PM -0700, Ric Wheeler wrote:
> 
>    (1) repair/fsck time can take hours or even days depending on the
> health of the file system and its underlying disk as well as the number
> of files.  This does not work well for large servers and is a disaster
> for "appliances" that need to run these commands buried deep in some
> data center without a person watching...
>    (2) most file system performance testing is done on "pristine" file
> systems with very few files.  Performance over time, especially with
> very high file counts, suffers very noticeable performance degradation
> with very large file systems.
>     (3) very poor fault containment for these very large devices - it
> would be great to be able to ride through a failure of a segment of the
> underlying storage without taking down the whole file system.
> 
> The obvious alternative to this is to break up these big disks into
> multiple small file systems, but there again we hit several issues.

1 and 3 are some of my main concerns, and what I want to focus a lot
of the workshop discussion on.  I view the question as: How do we keep
file system management simple while splitting the underlying storage
into isolated failure domains that can be repaired individually
online? (Say that three times fast.) Just splitting up into multiple
file systems only solves the second problem, and only if you have
forced umount, as you noted.

The approach we took in ZFS was to separate namespace management and
allocation management.  File systems aren't a fixed size, they take up
as much space as they need from a shared underlying pool.  You can
think of a file system in ZFS as a movable directory with management
bits attached.  I don't think this is the direction we should go, but
it's an example of separating your namespace management from a lot of
other stuff it doesn't really need to be attached to.

I don't think a block group is a good enough fault isolation domain -
think hard links.  What I think we need is normal file system
structures when you are referencing stuff inside your fault isolation
domain, and something more complicated if you have to reference stuff
outside.  One of Arjan's ideas involves something we're calling
continuation inodes - if the file's data is stored in multiple
domains, it has a separate continuation inode in each domain, and each
continuation inode has all the information necessary to run a full
fsck on the data inside that domain.  Similarly, if a directory has a
hard link to a file outside its domain, we'll have to allocate a
continuation inode and dir entry block in the domain containing the
file.  The idea is that you can run fsck on a domain without having to
go look outside that domain.  You may have to clean up a few things in
other domains, but they are easy to find and don't require an fsck in
other domains.

-VAL

next prev parent reply	other threads:[~2006-06-01  2:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-25 21:44 topics for the file system mini-summit Ric Wheeler
2006-05-26 16:48 ` Andreas Dilger
2006-05-27  0:49   ` Ric Wheeler
2006-05-27 14:18     ` Andreas Dilger
2006-05-28  1:44       ` Ric Wheeler
2006-05-29  0:11 ` Matthew Wilcox
2006-05-29  2:07   ` Ric Wheeler
2006-05-29 16:09     ` Andreas Dilger
2006-05-29 19:29       ` Ric Wheeler
2006-05-30  6:14         ` Andreas Dilger
2006-06-07 10:10       ` Stephen C. Tweedie
2006-06-07 14:03         ` Andi Kleen
2006-06-07 18:55         ` Andreas Dilger
2006-06-01  2:19 ` Valerie Henson [this message]
2006-06-01  2:42   ` Matthew Wilcox
2006-06-01  3:24     ` Valerie Henson
2006-06-01 12:45       ` Matthew Wilcox
2006-06-01 12:53         ` Arjan van de Ven
2006-06-01 20:06         ` Russell Cattelan
2006-06-02 11:27         ` Nathan Scott
2006-06-01  5:36   ` Andreas Dilger
2006-06-03 13:50   ` Ric Wheeler
2006-06-03 14:13     ` Arjan van de Ven
2006-06-03 15:07       ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060601021908.GL10420@goober \
    --to=val_henson@linux.intel.com \
    --cc=arjan@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ric@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).