From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Howells Subject: Re: [RFC] Support for stackable file systems on top of nfs Date: Mon, 14 Nov 2005 15:56:01 +0000 Message-ID: <17811.1131983761@warthog.cambridge.redhat.com> References: <1131676316.8804.93.camel@lade.trondhjem.org> <1131643942.9389.17.camel@kleikamp.austin.ibm.com> <20051110200741.GA23192@infradead.org> <200511102135.jAALZlfS016100@sumu.lexma.ibm.com> Cc: "John T. Kohl" , dhowells@redhat.com, nfsv4 , fsdevel Return-path: Received: from mx1.redhat.com ([66.187.233.31]:25561 "EHLO mx1.redhat.com") by vger.kernel.org with ESMTP id S1751166AbVKNP4L (ORCPT ); Mon, 14 Nov 2005 10:56:11 -0500 In-Reply-To: <1131676316.8804.93.camel@lade.trondhjem.org> To: Trond Myklebust Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Trond Myklebust wrote: > > CODA certainly won't work today with NFS host inodes and mapped files. > > I'm not surprised nobody noticed, since that seems like a poor way to > > use CODA. Using NFS backing store is a primary use case for ClearCase > > MVFS, so we noticed. > > It sounds to me like you want to talk to the cachefs folks. They too > need special hooks in the NFS low-level page cache routines in order to > be able to mirror write requests to the local backing store and/or > reroute read requests to that backing store. > > David? There are a number reasons I don't want to use i_mapping redirection to support caching, as nice as it may seem to do that: (1) Most filesystems don't do hole reportage. Holes in files are treated as blocks of zeros and can't be distinguished otherwise. (2) The backing inode must be fully populated before being exposed to userspace through the main inode because the VM/VFS goes directly to the backing inode and does not interrogate the front inode on VM ops. Therefore: (a) The backing inode must fit entirely within the cache. (b) All backed files currently open must fit entirely within the cache at the same time. (c) A working set of files in total larger than the cache may not be cached. (d) A file may not grow larger than the available space in the cache. (e) A file that's open and cached, and remotely grows larger than the cache is potentially stuffed. (3) Writes go to the backing filesystem, and can only be transferred to the network when the file is closed. (4) There's no record of what changes have been made, so the whole file must be written back. (5) The pages belong to the backing filesystem, and all metadata associated with that page are relevant only to the backing filesystem, and not anything stacked atop it. Reading through i_mapping is fun, especially when a normal filesystem is used: (1) You cannot, for the most part, detect holes, and so you can't use holes to denote as-yet unfetched blocks. (2) You don't want a page attached to the netfs that has a duplicate attached to the backing fs. (3) It isn't possible to share a page between two filesystems. Both of them tend to attempt to assert control over the metadata of the page. What I do with FS-Cache/CacheFS is to say that the netfs owns the page, and that the cache will read or write the netfs's page directly. The cache will assume that a block it has not yet been given (a hole) is data not yet retrieved from the network. Writing through i_mapping is also fun, particularly if you have shared writable mappings available. (1) With shared-mmap you don't know what's changed. (2) With write you can at least determine what's changed, though it may be tricky to keep track of what has been written to the cache yet. (3) You can't use prepare_write and commit_write... they belong to the underlying FS. (4) You may have to write the entire file back if it's been changed. With FS-Cache/CacheFS the pages belong to the netfs. We use a second page bit (PG_fs_misc) to keep track of data being written to the cache in addition to PG_writeback - which tracks data being written to the network. The big problem is that a page cannot belong to several filesystems at once, and cannot hold metadata for those filesystems all at the same time. David