linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "John T. Kohl" <jtk@us.ibm.com>,
	dhowells@redhat.com, nfsv4 <nfsv4@linux-nfs.org>,
	fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] Support for stackable file systems on top of nfs
Date: Mon, 14 Nov 2005 15:56:01 +0000	[thread overview]
Message-ID: <17811.1131983761@warthog.cambridge.redhat.com> (raw)
In-Reply-To: <1131676316.8804.93.camel@lade.trondhjem.org>

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> > CODA certainly won't work today with NFS host inodes and mapped files.
> > I'm not surprised nobody noticed, since that seems like a poor way to
> > use CODA.  Using NFS backing store is a primary use case for ClearCase
> > MVFS, so we noticed.
> 
> It sounds to me like you want to talk to the cachefs folks. They too
> need special hooks in the NFS low-level page cache routines in order to
> be able to mirror write requests to the local backing store and/or
> reroute read requests to that backing store.
> 
> David?

There are a number reasons I don't want to use i_mapping redirection to
support caching, as nice as it may seem to do that:

 (1) Most filesystems don't do hole reportage. Holes in files are treated as
     blocks of zeros and can't be distinguished otherwise.

 (2) The backing inode must be fully populated before being exposed
     to userspace through the main inode because the VM/VFS goes directly to
     the backing inode and does not interrogate the front inode on VM ops.

     Therefore:

     (a) The backing inode must fit entirely within the cache.

     (b) All backed files currently open must fit entirely within the cache at
     	 the same time.

     (c) A working set of files in total larger than the cache may not be
     	 cached.

     (d) A file may not grow larger than the available space in the cache.

     (e) A file that's open and cached, and remotely grows larger than the
     	 cache is potentially stuffed.

 (3) Writes go to the backing filesystem, and can only be transferred to the
     network when the file is closed.

 (4) There's no record of what changes have been made, so the whole file must
     be written back.

 (5) The pages belong to the backing filesystem, and all metadata associated
     with that page are relevant only to the backing filesystem, and not
     anything stacked atop it.


Reading through i_mapping is fun, especially when a normal filesystem is used:

 (1) You cannot, for the most part, detect holes, and so you can't use holes
     to denote as-yet unfetched blocks.

 (2) You don't want a page attached to the netfs that has a duplicate attached
     to the backing fs.

 (3) It isn't possible to share a page between two filesystems. Both of them
     tend to attempt to assert control over the metadata of the page.

What I do with FS-Cache/CacheFS is to say that the netfs owns the page, and
that the cache will read or write the netfs's page directly. The cache will
assume that a block it has not yet been given (a hole) is data not yet
retrieved from the network.


Writing through i_mapping is also fun, particularly if you have shared
writable mappings available.

 (1) With shared-mmap you don't know what's changed.

 (2) With write you can at least determine what's changed, though it may be
     tricky to keep track of what has been written to the cache yet.

 (3) You can't use prepare_write and commit_write... they belong to the
     underlying FS.

 (4) You may have to write the entire file back if it's been changed.

With FS-Cache/CacheFS the pages belong to the netfs. We use a second page bit
(PG_fs_misc) to keep track of data being written to the cache in addition to
PG_writeback - which tracks data being written to the network.


The big problem is that a page cannot belong to several filesystems at once,
and cannot hold metadata for those filesystems all at the same time.

David

  parent reply	other threads:[~2005-11-14 15:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-10 17:32 [RFC] Support for stackable file systems on top of nfs Dave Kleikamp
2005-11-10 20:07 ` Christoph Hellwig
2005-11-10 21:35   ` John T. Kohl
2005-11-10 21:40     ` Shaya Potter
2005-11-10 21:57       ` John T. Kohl
2005-11-10 21:50     ` Christoph Hellwig
2005-11-11  2:31     ` Trond Myklebust
2005-11-11  4:04       ` Trond Myklebust
2005-11-11 13:45         ` John T. Kohl
2005-11-11 15:27           ` Charles P. Wright
2005-11-11 17:38             ` John T. Kohl
2005-11-14 15:56     ` David Howells [this message]
2005-11-10 21:24 ` Trond Myklebust
2005-11-10 21:36   ` Shaya Potter
2005-11-10 22:18     ` Trond Myklebust
2005-11-10 22:27       ` Shaya Potter
2005-11-10 22:40         ` Trond Myklebust
2005-11-11  0:12           ` Bryan Henderson
2005-11-11  1:30             ` Brad Boyer
2005-11-11  2:06             ` Trond Myklebust
2005-11-11 18:18               ` Bryan Henderson
2005-11-11 19:22                 ` Trond Myklebust
2005-11-11 21:57                   ` Bryan Henderson
2005-11-11 22:41                     ` Trond Myklebust
2005-11-14 19:02                       ` Bryan Henderson
2005-11-11 16:40             ` Nikita Danilov
2005-11-11 18:45               ` Bryan Henderson
2005-11-11 19:31                 ` Nikita Danilov
2005-11-11 19:42                   ` Trond Myklebust
2005-11-11 23:13                   ` Bryan Henderson
  -- strict thread matches above, loose matches on Subject: below --
2005-11-14  0:44 Nikolai Joukov
2005-11-14 16:02 ` David Howells
2005-11-14 20:48   ` Erez Zadok
2005-11-14 21:13     ` John T. Kohl
2005-11-14 21:32       ` Jamie Lokier
2005-11-14 16:11 ` John T. Kohl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17811.1131983761@warthog.cambridge.redhat.com \
    --to=dhowells@redhat.com \
    --cc=jtk@us.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nfsv4@linux-nfs.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).