From: David Howells <dhowells@redhat.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "John T. Kohl" <jtk@us.ibm.com>,
dhowells@redhat.com, nfsv4 <nfsv4@linux-nfs.org>,
fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] Support for stackable file systems on top of nfs
Date: Mon, 14 Nov 2005 15:56:01 +0000 [thread overview]
Message-ID: <17811.1131983761@warthog.cambridge.redhat.com> (raw)
In-Reply-To: <1131676316.8804.93.camel@lade.trondhjem.org>
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > CODA certainly won't work today with NFS host inodes and mapped files.
> > I'm not surprised nobody noticed, since that seems like a poor way to
> > use CODA. Using NFS backing store is a primary use case for ClearCase
> > MVFS, so we noticed.
>
> It sounds to me like you want to talk to the cachefs folks. They too
> need special hooks in the NFS low-level page cache routines in order to
> be able to mirror write requests to the local backing store and/or
> reroute read requests to that backing store.
>
> David?
There are a number reasons I don't want to use i_mapping redirection to
support caching, as nice as it may seem to do that:
(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise.
(2) The backing inode must be fully populated before being exposed
to userspace through the main inode because the VM/VFS goes directly to
the backing inode and does not interrogate the front inode on VM ops.
Therefore:
(a) The backing inode must fit entirely within the cache.
(b) All backed files currently open must fit entirely within the cache at
the same time.
(c) A working set of files in total larger than the cache may not be
cached.
(d) A file may not grow larger than the available space in the cache.
(e) A file that's open and cached, and remotely grows larger than the
cache is potentially stuffed.
(3) Writes go to the backing filesystem, and can only be transferred to the
network when the file is closed.
(4) There's no record of what changes have been made, so the whole file must
be written back.
(5) The pages belong to the backing filesystem, and all metadata associated
with that page are relevant only to the backing filesystem, and not
anything stacked atop it.
Reading through i_mapping is fun, especially when a normal filesystem is used:
(1) You cannot, for the most part, detect holes, and so you can't use holes
to denote as-yet unfetched blocks.
(2) You don't want a page attached to the netfs that has a duplicate attached
to the backing fs.
(3) It isn't possible to share a page between two filesystems. Both of them
tend to attempt to assert control over the metadata of the page.
What I do with FS-Cache/CacheFS is to say that the netfs owns the page, and
that the cache will read or write the netfs's page directly. The cache will
assume that a block it has not yet been given (a hole) is data not yet
retrieved from the network.
Writing through i_mapping is also fun, particularly if you have shared
writable mappings available.
(1) With shared-mmap you don't know what's changed.
(2) With write you can at least determine what's changed, though it may be
tricky to keep track of what has been written to the cache yet.
(3) You can't use prepare_write and commit_write... they belong to the
underlying FS.
(4) You may have to write the entire file back if it's been changed.
With FS-Cache/CacheFS the pages belong to the netfs. We use a second page bit
(PG_fs_misc) to keep track of data being written to the cache in addition to
PG_writeback - which tracks data being written to the network.
The big problem is that a page cannot belong to several filesystems at once,
and cannot hold metadata for those filesystems all at the same time.
David
next prev parent reply other threads:[~2005-11-14 15:56 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-10 17:32 [RFC] Support for stackable file systems on top of nfs Dave Kleikamp
2005-11-10 20:07 ` Christoph Hellwig
2005-11-10 21:35 ` John T. Kohl
2005-11-10 21:40 ` Shaya Potter
2005-11-10 21:57 ` John T. Kohl
2005-11-10 21:50 ` Christoph Hellwig
2005-11-11 2:31 ` Trond Myklebust
2005-11-11 4:04 ` Trond Myklebust
2005-11-11 13:45 ` John T. Kohl
2005-11-11 15:27 ` Charles P. Wright
2005-11-11 17:38 ` John T. Kohl
2005-11-14 15:56 ` David Howells [this message]
2005-11-10 21:24 ` Trond Myklebust
2005-11-10 21:36 ` Shaya Potter
2005-11-10 22:18 ` Trond Myklebust
2005-11-10 22:27 ` Shaya Potter
2005-11-10 22:40 ` Trond Myklebust
2005-11-11 0:12 ` Bryan Henderson
2005-11-11 1:30 ` Brad Boyer
2005-11-11 2:06 ` Trond Myklebust
2005-11-11 18:18 ` Bryan Henderson
2005-11-11 19:22 ` Trond Myklebust
2005-11-11 21:57 ` Bryan Henderson
2005-11-11 22:41 ` Trond Myklebust
2005-11-14 19:02 ` Bryan Henderson
2005-11-11 16:40 ` Nikita Danilov
2005-11-11 18:45 ` Bryan Henderson
2005-11-11 19:31 ` Nikita Danilov
2005-11-11 19:42 ` Trond Myklebust
2005-11-11 23:13 ` Bryan Henderson
-- strict thread matches above, loose matches on Subject: below --
2005-11-14 0:44 Nikolai Joukov
2005-11-14 16:02 ` David Howells
2005-11-14 20:48 ` Erez Zadok
2005-11-14 21:13 ` John T. Kohl
2005-11-14 21:32 ` Jamie Lokier
2005-11-14 16:11 ` John T. Kohl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17811.1131983761@warthog.cambridge.redhat.com \
--to=dhowells@redhat.com \
--cc=jtk@us.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=nfsv4@linux-nfs.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).