From: Andreas Dilger <adilger@clusterfs.com>
To: Ric Wheeler <ric@emc.com>
Cc: Matthew Wilcox <matthew@wil.cx>, linux-fsdevel@vger.kernel.org
Subject: Re: topics for the file system mini-summit
Date: Mon, 29 May 2006 10:09:05 -0600 [thread overview]
Message-ID: <20060529160905.GX5964@schatzie.adilger.int> (raw)
In-Reply-To: <447A5747.2020806@emc.com>
On May 28, 2006 22:07 -0400, Ric Wheeler wrote:
> I think that the namespace needs to present a normal file system set of
> operations - support for hardlinks, no magic directories, etc. so that
> applications don't need to load balance (or even be aware) of the
> sub-units that provide storage. If we removed that requirement, we
> would be back to today's collection of various file systems mounted on a
> single host.
>
> I know that lustre aggregates full file systems
Yes - we have a metadata-only filesystem which exports the inode numbers
and namespace, and then separate (essentially private) filesystems that
store all of the data. The object store filesystems do not export any
namespace that is visible to userspace.
> you could build a
> file system on top of a collection of disk partitions/LUN's and then
> your inode would could be extended to encode the partition number and
> the internal mapping. You could even harden the block groups to the
> point that fsck could heal one group while the file system was (mostly?)
> online backed up by the rest of the block groups...
This is one thing that we have been thinking of for ext3. Instead of a
filesystem-wide "error" bit we could move this per-group to only mark
the block or inode bitmaps in error if they have a checksum failure.
This would prevent allocations from that group to avoid further potential
corruption of the filesystem metadata.
Once an error is detected then a filesystem service thread or a userspace
helper would walk the inode table (starting in the current group, which
is most likely to hold the relevant data) recreating the respective bitmap
table and keeping a "valid bit" bitmap as well. Once all of the bits
in the bitmap are marked valid then we can start using this group again.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2006-05-29 16:09 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-25 21:44 topics for the file system mini-summit Ric Wheeler
2006-05-26 16:48 ` Andreas Dilger
2006-05-27 0:49 ` Ric Wheeler
2006-05-27 14:18 ` Andreas Dilger
2006-05-28 1:44 ` Ric Wheeler
2006-05-29 0:11 ` Matthew Wilcox
2006-05-29 2:07 ` Ric Wheeler
2006-05-29 16:09 ` Andreas Dilger [this message]
2006-05-29 19:29 ` Ric Wheeler
2006-05-30 6:14 ` Andreas Dilger
2006-06-07 10:10 ` Stephen C. Tweedie
2006-06-07 14:03 ` Andi Kleen
2006-06-07 18:55 ` Andreas Dilger
2006-06-01 2:19 ` Valerie Henson
2006-06-01 2:42 ` Matthew Wilcox
2006-06-01 3:24 ` Valerie Henson
2006-06-01 12:45 ` Matthew Wilcox
2006-06-01 12:53 ` Arjan van de Ven
2006-06-01 20:06 ` Russell Cattelan
2006-06-02 11:27 ` Nathan Scott
2006-06-01 5:36 ` Andreas Dilger
2006-06-03 13:50 ` Ric Wheeler
2006-06-03 14:13 ` Arjan van de Ven
2006-06-03 15:07 ` Ric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060529160905.GX5964@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=ric@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).