[PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing  [try #13]
@ 2006-08-30 19:31 David Howells
  2006-08-30 19:31 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
                   ` (5 more replies)
  0 siblings, 6 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:31 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel



These patches add local caching for network filesystems such as NFS and AFS.

The patches can be grouped as:

 (A) 01-04

     Filesystem caching, including support for AFS.

 (B) 05

     Filesystem caching support for NFS; depends on (A) and upon the superblock
     sharing patches in Trond's tree.

 (C) 06-07

     CacheFiles: cache on files backend; depends on (A).

Note to Andrew Morton: I have not included the 64-bit inode number patches, the
dentry destruction patches or any NFS superblock sharing fix patches in this
patch set.

---
Changes in [try #13]:

 (*) [PATCH] FS-Cache: Generic filesystem caching facility

     [*] Don't give a warning if the cache backend returns ENOBUFS to
     	 __fscache_acquire_cookie() - that just indicates the cache couldn't
     	 find space to store the object.

 (*) [PATCH] FS-Cache: Make kAFS use FS-Cache

     [*] Don't examine PG_fs_misc if not caching.

 (*) [PATCH] NFS: Use local caching

     [*] Make nfs_fscache_release_page() compile if not caching (and return
     	 true).

 (*) [PATCH] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem

     [*] Clean up format warnings due to 64-bit types vs %llx.

     [*] Don't present error obtained in cachefiles_write_page() to netfs
     	 twice; the netfs callback shouldn't be invoked as we can return the
     	 error directly.

     [*] Handle ENOSPC obtained from the backing filesystem by translating it
     	 into ENOBUFS.

     [*] Maintain a quantity of free files/inodes in the cache in addition to a
     	 certain amount of free space.  Cull the cache if there are
     	 insufficient free files.  Permit the limits on that to be configured
     	 too.

     [*] The cachefilesd package has been updated to v0.6 to fix a number of
     	 bugs and also to document the new options to control the quantity of
     	 files kept available.  See:

		http://people.redhat.com/~dhowells/fscache/
		http://people.redhat.com/steved/fscache/

Changes in [try #12]:

 (*) [PATCH] FS-Cache: Release page->private after failed readahead
     [PATCH] FS-Cache: Make kAFS use FS-Cache
     [PATCH] NFS: Use local caching

     [*] Use invalidatepage() rather than releasepage() to forcibly invalidate
     	 a page that we failed to add to the pagecache.

     [*] Make AFS and NFS's releasepage() ops return false if the page is busy
     	 interacting with the cache rather than waiting for the cache
     	 interaction to complete.  invalidatepage() still waits.

 (*) [PATCH] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem

     [*] Fix a printk format warning.

 (*) [PATCH] NFS: Use local caching

     [*] Handle upstream changes to base version of nfs_release_page().

Changes in [try #11]:

 (*) Split up of the NFS superblock sharing patches into a set of smaller
     patches and reworked some of the contents as per Trond's suggestions.

 (*) [PATCH] NFS: Fix error handling

     [*] Fix error handling in earlier patches (the earlier patches are also in
     	 Trond's NFS tree, so I haven't rolled this in for the moment).

 (*) [PATCH] NFS: Secure the roots of the NFS subtrees in a shared superblock

     [*] Initialise the security on detached NFS roots manually since they're
     	 allocated with dcache_alloc_anon() not dcache_alloc_root().

 (*) [PATCH] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem

     [*] Don't use file structs when accessing the data storage backing files.
     	 Pass NULL as the file argument to prepare_write() and commit_write()
     	 calls.

     [*] Check for a bmap() inode op to prevent NFS being used as the cache
     	 backing store (and besides, we need bmap() available anyway).

     [*] Make the calls to the statfs() superblock op supply a dentry not a
     	 vfsmount.

     [*] CONFIG_CACHEFILES_DEBUG permits _enter(), _debug() and _exit() to be
     	 enabled dynamically.

     [*] debugging macros are checked by gcc for printf format compliance even
     	 when completely disabled.

 (*) [PATCH] FS-Cache: CacheFiles: ia64: missing copy_page export

     [*] Export copy_page() on IA-64 as we need that.

 (*) [PATCH] AUTOFS: Make sure all dentries refs are released before calling kill_anon_super()

     [*] Make sure autofs4 releases all its retained dentries in its kill_sb()
     	 op before calling kill_anon_super() rather than in the put_super() op.
     	 This prevents the next patch from oopsing it.

 (*) [PATCH] VFS: Destroy the dentries contributed by a superblock on unmounting

     [*] Optimise the destruction of the dentries attached to a superblock
     	 during unmounting.

Changes in [try #10]:

 (*) [PATCH] NFS: Permit filesystem to perform statfs with a known root dentry

     [*] Pass a dentry rather than a vfsmount to the statfs() op as the key by
     	 which to determine the filesystem.

 (*) [PATCH] NFS: Share NFS superblocks per-protocol per-server per-FSID

     [*] nfs4_pathname_string() needed an extra const.

 (*) [PATCH] FS-Cache: Release page->private in failed readahead

     [*] The comment header on the helper function is much expanded.  This
     	 states why there's a need to call the releasepage() op in the event of
     	 an error.

     [*] BUG() if the page is already locked when we try and lock it.

     [*] Don't set the page mapping pointer until we've locked the page.

     [*] The page is unlocked after try_to_release_page() is called.

 (*) The release-page patch now comes before the fscache-afs patch as well as
     the fscache-nfs patch.

Changes in [try #9]:

 (*) [PATCH] NFS: Permit filesystem to perform statfs with a known root dentry

     [*] Inclusions of linux/mount.h have been added where necessary to make
       	 allyesconfig build successfully.

 (*) [PATCH] NFS: Share NFS superblocks per-protocol per-server per-FSID

     [*] The exports from fs/namespace.c and fs/namei.c are no longer required.

 (*) [PATCH] FS-Cache: Release page->private in failed readahead

     [*] The try_to_release_page() is called instead of calling the
     	 releasepage() op directly.

     [*] The page is locked before try_to_release_page() is called.

     [*] The call to try_to_release_page() and page_cache_release() have been
     	 abstracted out into a helper function as this bit of code occurs
     	 twice..

---
In response to those who've asked, there are at least three reasons for
implementing superblock sharing:

 (1) As I understand what I've been told, NFSv4 requires a 1:1 mapping between
     server files and client files.  I suspect this has to do with the
     management of leases.

 (2) We can reduce the resource consumption on NFSv2 and NFSv3 clients as well
     as on NFSv4 clients by sharing superblocks that cover overlapping segments
     of the file space.

     Consider a machine that's used by a lot of people at the same time, each
     of whom has an automounted NFS homedir off of the same server - and in
     fact off of the same disk on the that server.  Currently, with Linus's
     tree, each one will get a separate superblock to represent them; with
     Trond's tree, each one will still get a separate superblock unless they
     share the same root filehandle; and with my patches, they'll get the same
     superblock.

     If two homedirs have a hard link between them (unlikely, I know, but by no
     means impossible, and probably more likely with, say, data such as NFS
     mounted git repositories), then you have the possibility of aliasing.
     This means that you can have two or more inodes in core that refer to the
     same server object, and each of these inodes can have pages that refer to
     the same remote pages on the server - aliasing again.  You _have_ to have
     two inodes because they're covered by separate superblocks.

     Aliasing is bad, generally, because you end up using more storage than
     you need to (pagecache and inode cache in this case), and you have the
     problem of keeping them in sync.  It's also twice as hard to keep two
     inodes up to date when they change on the server as to keep one up to
     date.

     If you can use the same superblock where possible, then you can cut out
     aliasing on that client since you can share dentries that have the same
     file handle (hard links or subtrees).

     Part of the problem with NFSv2 and NFSv3 is that you invoke mountd to get
     the filehandle to a subtree, but you may not be able to work out how two
     different subtrees relate.  The getsb patch permits the superblock to
     have more than one root, which allows us to defer this problem until we
     see the root of one subtree cropping up in another subtree - at which
     point we can splice the former into the latter.

 (3) In my local file caching patches (FS-Cache), I have two reasons for
     wanting this:

     (a) Unique keys.  I need a unique key to find an object in the cache.  If
     	 we can get inode aliases, then I end up with several inodes referring
     	 to the same cache object.  This also means that I have to use a fair
     	 bit of extra memory to keep track of the multiple cookie mappings in
     	 FS-Cache, and have to compare keys a lot to find duplicate mappings.

	 If I can assume that the _netfs_ will manage the 1:1 mapping, I can
	 use a lot less memory and save some processing capacity also.

	 I don't want to invent random keys to differentiate aliased
	 superblocks or inodes as that destroys the persistence capabilities
	 of the cache across power failures and reboots.

     (b) Callbacks.  I want a callback that the netfs passes to FS-Cache to
     	 permit the cache to update the metadata in the cache from netfs
     	 metadata at convenient times.  However, if there's more than one
     	 inode alias in core, which one should the cache use?

AFS doesn't have anything like these problems because mounts are always made
from the root of a volume, and AFS was designed with local caching in mind.

The getsb and statfs patches are a consequence of NFS being permitted to mount
arbitrary subtrees from the server.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
@ 2006-08-30 19:31 ` David Howells
  2006-08-30 19:32 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:31 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch provides a filesystem-specific page bit that a filesystem
can synchronise upon.  This can be used, for example, by a netfs to synchronise
with CacheFS writing its pages to disk.

The PG_checked bit is replaced with PG_fs_misc, and various operations are
provided based upon that.  The *PageChecked() macros have also been replaced.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c               |    5 +----
 fs/ext2/dir.c              |    6 +++---
 fs/ext3/inode.c            |   10 +++++-----
 fs/freevxfs/vxfs_subr.c    |    2 +-
 fs/reiserfs/inode.c        |   10 +++++-----
 fs/ufs/dir.c               |    6 +++---
 include/linux/page-flags.h |   15 ++++++++++-----
 include/linux/pagemap.h    |   11 +++++++++++
 mm/filemap.c               |   17 +++++++++++++++++
 mm/migrate.c               |    4 ++--
 mm/page_alloc.c            |    2 +-
 11 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index cf8a2cb..f1c965f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -155,11 +155,9 @@ #endif
 		}
 	}
 
-	SetPageChecked(page);
 	return;
 
  error:
-	SetPageChecked(page);
 	SetPageError(page);
 
 } /* end afs_dir_check_page() */
@@ -191,8 +189,7 @@ static struct page *afs_dir_get_page(str
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
-			afs_dir_check_page(dir, page);
+		afs_dir_check_page(dir, page);
 		if (PageError(page))
 			goto fail;
 	}
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 92ea826..b0e7d9f 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -112,7 +112,7 @@ static void ext2_check_page(struct page 
 	if (offs != limit)
 		goto Eend;
 out:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return;
 
 	/* Too bad, we had an error */
@@ -152,7 +152,7 @@ Eend:
 		dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs,
 		(unsigned long) le32_to_cpu(p->inode));
 fail:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	SetPageError(page);
 }
 
@@ -165,7 +165,7 @@ static struct page * ext2_get_page(struc
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
+		if (!PageFsMisc(page))
 			ext2_check_page(page);
 		if (PageError(page))
 			goto fail;
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index c5ee9f0..81ebf9b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1527,12 +1527,12 @@ static int ext3_journalled_writepage(str
 		goto no_write;
 	}
 
-	if (!page_has_buffers(page) || PageChecked(page)) {
+	if (!page_has_buffers(page) || PageFsMisc(page)) {
 		/*
 		 * It's mmapped pagecache.  Add buffers and journal it.  There
 		 * doesn't seem much point in redirtying the page here.
 		 */
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 		ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
 					ext3_get_block);
 		if (ret != 0) {
@@ -1589,7 +1589,7 @@ static void ext3_invalidatepage(struct p
 	 * If it's a full truncate we just forget about the pending dirtying
 	 */
 	if (offset == 0)
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 
 	journal_invalidatepage(journal, page, offset);
 }
@@ -1598,7 +1598,7 @@ static int ext3_releasepage(struct page 
 {
 	journal_t *journal = EXT3_JOURNAL(page->mapping->host);
 
-	WARN_ON(PageChecked(page));
+	WARN_ON(PageFsMisc(page));
 	if (!page_has_buffers(page))
 		return 0;
 	return journal_try_to_free_buffers(journal, page, wait);
@@ -1694,7 +1694,7 @@ out:
  */
 static int ext3_journalled_set_page_dirty(struct page *page)
 {
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return __set_page_dirty_nobuffers(page);
 }
 
diff --git a/fs/freevxfs/vxfs_subr.c b/fs/freevxfs/vxfs_subr.c
index decac62..805bbb2 100644
--- a/fs/freevxfs/vxfs_subr.c
+++ b/fs/freevxfs/vxfs_subr.c
@@ -78,7 +78,7 @@ vxfs_get_page(struct address_space *mapp
 		kmap(pp);
 		if (!PageUptodate(pp))
 			goto fail;
-		/** if (!PageChecked(pp)) **/
+		/** if (!PageFsMisc(pp)) **/
 			/** vxfs_check_page(pp); **/
 		if (PageError(pp))
 			goto fail;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 52f1e21..af4dd36 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2344,7 +2344,7 @@ static int reiserfs_write_full_page(stru
 	struct buffer_head *head, *bh;
 	int partial = 0;
 	int nr = 0;
-	int checked = PageChecked(page);
+	int checked = PageFsMisc(page);
 	struct reiserfs_transaction_handle th;
 	struct super_block *s = inode->i_sb;
 	int bh_per_page = PAGE_CACHE_SIZE / s->s_blocksize;
@@ -2422,7 +2422,7 @@ static int reiserfs_write_full_page(stru
 	 * blocks we're going to log
 	 */
 	if (checked) {
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 		reiserfs_write_lock(s);
 		error = journal_begin(&th, s, bh_per_page + 1);
 		if (error) {
@@ -2803,7 +2803,7 @@ static void reiserfs_invalidatepage(stru
 	BUG_ON(!PageLocked(page));
 
 	if (offset == 0)
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 
 	if (!page_has_buffers(page))
 		goto out;
@@ -2844,7 +2844,7 @@ static int reiserfs_set_page_dirty(struc
 {
 	struct inode *inode = page->mapping->host;
 	if (reiserfs_file_data_log(inode)) {
-		SetPageChecked(page);
+		SetPageFsMisc(page);
 		return __set_page_dirty_nobuffers(page);
 	}
 	return __set_page_dirty_buffers(page);
@@ -2867,7 +2867,7 @@ static int reiserfs_releasepage(struct p
 	struct buffer_head *bh;
 	int ret = 1;
 
-	WARN_ON(PageChecked(page));
+	WARN_ON(PageFsMisc(page));
 	spin_lock(&j->j_dirty_buffers_lock);
 	head = page_buffers(page);
 	bh = head;
diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 7f0a0aa..e04327c 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -135,7 +135,7 @@ static void ufs_check_page(struct page *
 	if (offs != limit)
 		goto Eend;
 out:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return;
 
 	/* Too bad, we had an error */
@@ -173,7 +173,7 @@ Eend:
 		   "offset=%lu",
 		   dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs);
 fail:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	SetPageError(page);
 }
 
@@ -187,7 +187,7 @@ static struct page *ufs_get_page(struct 
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
+		if (!PageFsMisc(page))
 			ufs_check_page(page);
 		if (PageError(page))
 			goto fail;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5748642..6e017b7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -71,7 +71,7 @@ #define PG_lru			 5
 #define PG_active		 6
 #define PG_slab			 7	/* slab debug (Suparna wants this) */
 
-#define PG_checked		 8	/* kill me in 2.5.<early>. */
+#define PG_fs_misc		 8
 #define PG_arch_1		 9
 #define PG_reserved		10
 #define PG_private		11	/* Has something at ->private */
@@ -161,10 +161,6 @@ #else
 #define PageHighMem(page)	0 /* needed to optimize away at compile time */
 #endif
 
-#define PageChecked(page)	test_bit(PG_checked, &(page)->flags)
-#define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
-#define ClearPageChecked(page)	clear_bit(PG_checked, &(page)->flags)
-
 #define PageReserved(page)	test_bit(PG_reserved, &(page)->flags)
 #define SetPageReserved(page)	set_bit(PG_reserved, &(page)->flags)
 #define ClearPageReserved(page)	clear_bit(PG_reserved, &(page)->flags)
@@ -263,4 +259,13 @@ static inline void set_page_writeback(st
 	test_set_page_writeback(page);
 }
 
+/*
+ * Filesystem-specific page bit testing
+ */
+#define PageFsMisc(page)		test_bit(PG_fs_misc, &(page)->flags)
+#define SetPageFsMisc(page)		set_bit(PG_fs_misc, &(page)->flags)
+#define TestSetPageFsMisc(page)		test_and_set_bit(PG_fs_misc, &(page)->flags)
+#define ClearPageFsMisc(page)		clear_bit(PG_fs_misc, &(page)->flags)
+#define TestClearPageFsMisc(page)	test_and_clear_bit(PG_fs_misc, &(page)->flags)
+
 #endif	/* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0a2f5d2..82b2753 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -170,6 +170,17 @@ static inline void wait_on_page_writebac
 extern void end_page_writeback(struct page *page);
 
 /*
+ * Wait for filesystem-specific page synchronisation to complete
+ */
+static inline void wait_on_page_fs_misc(struct page *page)
+{
+	if (PageFsMisc(page))
+		wait_on_page_bit(page, PG_fs_misc);
+}
+
+extern void fastcall end_page_fs_misc(struct page *page);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index b9a60c4..2e365d4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -577,6 +577,23 @@ void fastcall __lock_page(struct page *p
 }
 EXPORT_SYMBOL(__lock_page);
 
+/*
+ * Note completion of filesystem specific page synchronisation
+ *
+ * This is used to allow a page to be written to a filesystem cache in the
+ * background without holding up the completion of readpage
+ */
+void fastcall end_page_fs_misc(struct page *page)
+{
+	smp_mb__before_clear_bit();
+	if (!TestClearPageFsMisc(page))
+		BUG();
+	smp_mb__after_clear_bit();
+	__wake_up_bit(page_waitqueue(page), &page->flags, PG_fs_misc);
+}
+
+EXPORT_SYMBOL(end_page_fs_misc);
+
 /**
  * find_get_page - find and get a page reference
  * @mapping: the address_space to search
diff --git a/mm/migrate.c b/mm/migrate.c
index 3f1e0c2..08c9fff 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -348,8 +348,8 @@ static void migrate_page_copy(struct pag
 		SetPageUptodate(newpage);
 	if (PageActive(page))
 		SetPageActive(newpage);
-	if (PageChecked(page))
-		SetPageChecked(newpage);
+	if (PageFsMisc(page))
+		SetPageFsMisc(newpage);
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 54a4f53..8bb003b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -550,7 +550,7 @@ static int prep_new_page(struct page *pa
 
 	page->flags &= ~(1 << PG_uptodate | 1 << PG_error |
 			1 << PG_referenced | 1 << PG_arch_1 |
-			1 << PG_checked | 1 << PG_mappedtodisk);
+			1 << PG_fs_misc | 1 << PG_mappedtodisk);
 	set_page_private(page, 0);
 	set_page_refcounted(page);
 	kernel_map_pages(page, 1 << order, 1);

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 3/7] FS-Cache: Release page->private after failed readahead [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
  2006-08-30 19:31 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
@ 2006-08-30 19:32 ` David Howells
  2006-08-30 19:32 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:32 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 mm/readahead.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index aa7ec42..b0788ae 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -14,6 +14,7 @@ #include <linux/module.h>
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/pagevec.h>
+#include <linux/buffer_head.h>
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -117,6 +118,41 @@ static inline unsigned long get_next_ra_
 
 #define list_to_page(head) (list_entry((head)->prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+					     struct page *page)
+{
+	if (PagePrivate(page)) {
+		if (TestSetPageLocked(page))
+			BUG();
+		page->mapping = mapping;
+		do_invalidatepage(page, 0);
+		page->mapping = NULL;
+		unlock_page(page);
+	}
+	page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+					      struct list_head *pages)
+{
+	struct page *victim;
+
+	while (!list_empty(pages)) {
+		victim = list_to_page(pages);
+		list_del(&victim->lru);
+		read_cache_pages_invalidate_page(mapping, victim);
+	}
+}
+
 /**
  * read_cache_pages - populate an address space with some pages & start reads against them
  * @mapping: the address_space
@@ -140,20 +176,14 @@ int read_cache_pages(struct address_spac
 		page = list_to_page(pages);
 		list_del(&page->lru);
 		if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
-			page_cache_release(page);
+			read_cache_pages_invalidate_page(mapping, page);
 			continue;
 		}
 		ret = filler(data, page);
 		if (!pagevec_add(&lru_pvec, page))
 			__pagevec_lru_add(&lru_pvec);
 		if (ret) {
-			while (!list_empty(pages)) {
-				struct page *victim;
-
-				victim = list_to_page(pages);
-				list_del(&victim->lru);
-				page_cache_release(victim);
-			}
+			read_cache_pages_invalidate_pages(mapping, pages);
 			break;
 		}
 	}

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
  2006-08-30 19:31 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
  2006-08-30 19:32 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
@ 2006-08-30 19:32 ` David Howells
  2006-08-30 19:32 ` [PATCH 5/7] NFS: Use local caching " David Howells
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:32 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.  The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/Kconfig         |    7 +
 fs/afs/cache.h     |   27 -----
 fs/afs/cell.c      |  109 +++++++++++++--------
 fs/afs/cell.h      |   16 +--
 fs/afs/cmservice.c |    2 
 fs/afs/dir.c       |   10 +-
 fs/afs/file.c      |  267 ++++++++++++++++++++++++++++++++++------------------
 fs/afs/fsclient.c  |    4 +
 fs/afs/inode.c     |   45 ++++++---
 fs/afs/internal.h  |   25 ++---
 fs/afs/main.c      |   24 ++---
 fs/afs/mntpt.c     |   12 +-
 fs/afs/proc.c      |    1 
 fs/afs/server.c    |    3 -
 fs/afs/vlocation.c |  179 +++++++++++++++++++++--------------
 fs/afs/vnode.c     |  250 ++++++++++++++++++++++++++++++++++++++++---------
 fs/afs/vnode.h     |   10 +-
 fs/afs/volume.c    |   78 ++++++---------
 fs/afs/volume.h    |   28 +----
 19 files changed, 673 insertions(+), 424 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 1a3d179..eecc0ed 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1922,6 +1922,13 @@ # for fs/nls/Config.in
 
 	  If unsure, say N.
 
+config AFS_FSCACHE
+	bool "Provide AFS client caching support"
+	depends on AFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want AFS data to be cached locally on through the
+	  generic filesystem cache manager
+
 config RXRPC
 	tristate
 
diff --git a/fs/afs/cache.h b/fs/afs/cache.h
deleted file mode 100644
index 9eb7722..0000000
--- a/fs/afs/cache.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* cache.h: AFS local cache management interface
- *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#ifndef _LINUX_AFS_CACHE_H
-#define _LINUX_AFS_CACHE_H
-
-#undef AFS_CACHING_SUPPORT
-
-#include <linux/mm.h>
-#ifdef AFS_CACHING_SUPPORT
-#include <linux/cachefs.h>
-#endif
-#include "types.h"
-
-#ifdef __KERNEL__
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_AFS_CACHE_H */
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index bfc1fd2..3aaeada 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -31,17 +31,21 @@ static DEFINE_RWLOCK(afs_cells_lock);
 static DECLARE_RWSEM(afs_cells_sem); /* add/remove serialisation */
 static struct afs_cell *afs_cell_root;
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-						const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
-	.name			= "cell_ix",
-	.data_size		= sizeof(struct afs_cache_cell),
-	.keys[0]		= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-	.match			= afs_cell_cache_match,
-	.update			= afs_cell_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+				       void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+				       void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+						   const void *buffer,
+						   uint16_t buflen);
+
+static struct fscache_cookie_def afs_cell_cache_index_def = {
+	.name		= "AFS cell",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_cell_cache_get_key,
+	.get_aux	= afs_cell_cache_get_aux,
+	.check_aux	= afs_cell_cache_check_aux,
 };
 #endif
 
@@ -115,12 +119,11 @@ int afs_cell_create(const char *name, ch
 	if (ret < 0)
 		goto error;
 
-#ifdef AFS_CACHING_SUPPORT
-	/* put it up for caching */
-	cachefs_acquire_cookie(afs_cache_netfs.primary_index,
-			       &afs_vlocation_cache_index_def,
-			       cell,
-			       &cell->cache);
+#ifdef CONFIG_AFS_FSCACHE
+	/* put it up for caching (this never returns an error) */
+	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
+					     &afs_cell_cache_index_def,
+					     cell);
 #endif
 
 	/* add to the cell lists */
@@ -345,8 +348,8 @@ static void afs_cell_destroy(struct afs_
 	list_del_init(&cell->proc_link);
 	up_write(&afs_proc_cells_sem);
 
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(cell->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(cell->cache, 0);
 #endif
 
 	up_write(&afs_cells_sem);
@@ -525,44 +528,62 @@ void afs_cell_purge(void)
 
 /*****************************************************************************/
 /*
- * match a cell record obtained from the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-						const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+				       void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_cell *ccell = entry;
-	struct afs_cell *cell = target;
+	const struct afs_cell *cell = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%s},{%s}", ccell->name, cell->name);
+	_enter("%p,%p,%u", cell, buffer, bufmax);
 
-	if (strncmp(ccell->name, cell->name, sizeof(ccell->name)) == 0) {
-		_leave(" = SUCCESS");
-		return CACHEFS_MATCH_SUCCESS;
-	}
+	klen = strlen(cell->name);
+	if (klen > bufmax)
+		return 0;
+
+	memcpy(buffer, cell->name, klen);
+	return klen;
 
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_cell_cache_match() */
+} /* end afs_cell_cache_get_key() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a cell record in the cache
+ * provide new auxilliary cache data
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_cell_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+				       void *buffer, uint16_t bufmax)
 {
-	struct afs_cache_cell *ccell = entry;
-	struct afs_cell *cell = source;
+	const struct afs_cell *cell = cookie_netfs_data;
+	uint16_t dlen;
 
-	_enter("%p,%p", source, entry);
+	_enter("%p,%p,%u", cell, buffer, bufmax);
 
-	strncpy(ccell->name, cell->name, sizeof(ccell->name));
+	dlen = cell->vl_naddrs * sizeof(cell->vl_addrs[0]);
+	dlen = min(dlen, bufmax);
+	dlen &= ~(sizeof(cell->vl_addrs[0]) - 1);
 
-	memcpy(ccell->vl_servers,
-	       cell->vl_addrs,
-	       min(sizeof(ccell->vl_servers), sizeof(cell->vl_addrs)));
+	memcpy(buffer, cell->vl_addrs, dlen);
+
+	return dlen;
+
+} /* end afs_cell_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+						   const void *buffer,
+						   uint16_t buflen)
+{
+	_leave(" = OKAY");
+	return FSCACHE_CHECKAUX_OKAY;
 
-} /* end afs_cell_cache_update() */
+} /* end afs_cell_cache_check_aux() */
 #endif
diff --git a/fs/afs/cell.h b/fs/afs/cell.h
index 4834910..d670502 100644
--- a/fs/afs/cell.h
+++ b/fs/afs/cell.h
@@ -13,7 +13,7 @@ #ifndef _LINUX_AFS_CELL_H
 #define _LINUX_AFS_CELL_H
 
 #include "types.h"
-#include "cache.h"
+#include <linux/fscache.h>
 
 #define AFS_CELL_MAX_ADDRS 15
 
@@ -21,16 +21,6 @@ extern volatile int afs_cells_being_purg
 
 /*****************************************************************************/
 /*
- * entry in the cached cell catalogue
- */
-struct afs_cache_cell
-{
-	char			name[64];	/* cell name (padded with NULs) */
-	struct in_addr		vl_servers[15];	/* cached cell VL servers */
-};
-
-/*****************************************************************************/
-/*
  * AFS cell record
  */
 struct afs_cell
@@ -39,8 +29,8 @@ struct afs_cell
 	struct list_head	link;		/* main cell list link */
 	struct list_head	proc_link;	/* /proc cell list link */
 	struct proc_dir_entry	*proc_dir;	/* /proc dir for this cell */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 
 	/* server record management */
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 3d097fd..f87d5a7 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -24,7 +24,7 @@ #include "cmservice.h"
 #include "internal.h"
 
 static unsigned afscm_usage;		/* AFS cache manager usage count */
-static struct rw_semaphore afscm_sem;	/* AFS cache manager start/stop semaphore */
+static DECLARE_RWSEM(afscm_sem);	/* AFS cache manager start/stop semaphore */
 
 static int afscm_new_call(struct rxrpc_call *call);
 static void afscm_attention(struct rxrpc_call *call);
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index f1c965f..94afb75 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -145,7 +145,7 @@ #endif
 	qty /= sizeof(union afs_dir_block);
 
 	/* check them */
-	dbuf = page_address(page);
+	dbuf = kmap_atomic(page, KM_USER0);
 	for (tmp = 0; tmp < qty; tmp++) {
 		if (dbuf->blocks[tmp].pagehdr.magic != AFS_DIR_MAGIC) {
 			printk("kAFS: %s(%lu): bad magic %d/%d is %04hx\n",
@@ -154,10 +154,12 @@ #endif
 			goto error;
 		}
 	}
+	kunmap_atomic(dbuf, KM_USER0);
 
 	return;
 
  error:
+	kunmap_atomic(dbuf, KM_USER0);
 	SetPageError(page);
 
 } /* end afs_dir_check_page() */
@@ -168,7 +170,6 @@ #endif
  */
 static inline void afs_dir_put_page(struct page *page)
 {
-	kunmap(page);
 	page_cache_release(page);
 
 } /* end afs_dir_put_page() */
@@ -186,7 +187,6 @@ static struct page *afs_dir_get_page(str
 	page = read_mapping_page(dir->i_mapping, index, NULL);
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
-		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
 		afs_dir_check_page(dir, page);
@@ -354,7 +354,7 @@ static int afs_dir_iterate(struct inode 
 
 		limit = blkoff & ~(PAGE_SIZE - 1);
 
-		dbuf = page_address(page);
+		dbuf = kmap_atomic(page, KM_USER0);
 
 		/* deal with the individual blocks stashed on this page */
 		do {
@@ -363,6 +363,7 @@ static int afs_dir_iterate(struct inode 
 			ret = afs_dir_iterate_block(fpos, dblock, blkoff,
 						    cookie, filldir);
 			if (ret != 1) {
+				kunmap_atomic(dbuf, KM_USER0);
 				afs_dir_put_page(page);
 				goto out;
 			}
@@ -371,6 +372,7 @@ static int afs_dir_iterate(struct inode 
 
 		} while (*fpos < dir->i_size && blkoff < limit);
 
+		kunmap_atomic(dbuf, KM_USER0);
 		afs_dir_put_page(page);
 		ret = 0;
 	}
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 67d6634..db441c5 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -16,12 +16,15 @@ #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/buffer_head.h>
 #include "volume.h"
 #include "vnode.h"
 #include <rxrpc/call.h>
 #include "internal.h"
 
+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
 #if 0
 static int afs_file_open(struct inode *inode, struct file *file);
 static int afs_file_release(struct inode *inode, struct file *file);
@@ -30,55 +33,93 @@ #endif
 static int afs_file_readpage(struct file *file, struct page *page);
 static void afs_file_invalidatepage(struct page *page, unsigned long offset);
 static int afs_file_releasepage(struct page *page, gfp_t gfp_flags);
+static int afs_file_mmap(struct file * file, struct vm_area_struct * vma);
+
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+			      struct list_head *pages, unsigned nr_pages);
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+#endif
 
 struct inode_operations afs_file_inode_operations = {
 	.getattr	= afs_inode_getattr,
 };
 
+const struct file_operations afs_file_file_operations = {
+	.llseek		= generic_file_llseek,
+	.read		= generic_file_read,
+	.mmap		= afs_file_mmap,
+	.sendfile	= generic_file_sendfile,
+};
+
 const struct address_space_operations afs_fs_aops = {
 	.readpage	= afs_file_readpage,
+#ifdef CONFIG_AFS_FSCACHE
+	.readpages	= afs_file_readpages,
+#endif
 	.sync_page	= block_sync_page,
 	.set_page_dirty	= __set_page_dirty_nobuffers,
 	.releasepage	= afs_file_releasepage,
 	.invalidatepage	= afs_file_invalidatepage,
 };
 
+static struct vm_operations_struct afs_fs_vm_operations = {
+	.nopage		= filemap_nopage,
+	.populate	= filemap_populate,
+#ifdef CONFIG_AFS_FSCACHE
+	.page_mkwrite	= afs_file_page_mkwrite,
+#endif
+};
+
+/*****************************************************************************/
+/*
+ * set up a memory mapping on an AFS file
+ * - we set our own VMA ops so that we can catch the page becoming writable for
+ *   userspace for shared-writable mmap
+ */
+static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	_enter("");
+
+	file_accessed(file);
+	vma->vm_ops = &afs_fs_vm_operations;
+	return 0;
+}
+
 /*****************************************************************************/
 /*
  * deal with notification that a page was read from the cache
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_read_complete(void *cookie_data,
-					    struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_read_complete(struct page *page,
 					    void *data,
 					    int error)
 {
-	_enter("%p,%p,%p,%d", cookie_data, page, data, error);
+	_enter("%p,%p,%d", page, data, error);
 
-	if (error)
-		SetPageError(page);
-	else
+	/* if the read completes with an error, we just unlock the page and let
+	 * the VM reissue the readpage */
+	if (!error)
 		SetPageUptodate(page);
 	unlock_page(page);
-
-} /* end afs_file_readpage_read_complete() */
+}
 #endif
 
 /*****************************************************************************/
 /*
  * deal with notification that a page was written to the cache
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_write_complete(void *cookie_data,
-					     struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_write_complete(struct page *page,
 					     void *data,
 					     int error)
 {
-	_enter("%p,%p,%p,%d", cookie_data, page, data, error);
-
-	unlock_page(page);
+	_enter("%p,%p,%d", page, data, error);
 
-} /* end afs_file_readpage_write_complete() */
+	/* note that the page has been written to the cache and can now be
+	 * modified */
+	end_page_fs_misc(page);
+}
 #endif
 
 /*****************************************************************************/
@@ -88,16 +129,13 @@ #endif
 static int afs_file_readpage(struct file *file, struct page *page)
 {
 	struct afs_rxfs_fetch_descriptor desc;
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_page *pageio;
-#endif
 	struct afs_vnode *vnode;
 	struct inode *inode;
 	int ret;
 
 	inode = page->mapping->host;
 
-	_enter("{%lu},{%lu}", inode->i_ino, page->index);
+	_enter("{%lu},%p{%lu}", inode->i_ino, page, page->index);
 
 	vnode = AFS_FS_I(inode);
 
@@ -107,13 +145,9 @@ #endif
 	if (vnode->flags & AFS_VNODE_DELETED)
 		goto error;
 
-#ifdef AFS_CACHING_SUPPORT
-	ret = cachefs_page_get_private(page, &pageio, GFP_NOIO);
-	if (ret < 0)
-		goto error;
-
+#ifdef CONFIG_AFS_FSCACHE
 	/* is it cached? */
-	ret = cachefs_read_or_alloc_page(vnode->cache,
+	ret = fscache_read_or_alloc_page(vnode->cache,
 					 page,
 					 afs_file_readpage_read_complete,
 					 NULL,
@@ -123,18 +157,20 @@ #else
 #endif
 
 	switch (ret) {
-		/* read BIO submitted and wb-journal entry found */
-	case 1:
-		BUG(); // TODO - handle wb-journal match
-
 		/* read BIO submitted (page in cache) */
 	case 0:
 		break;
 
-		/* no page available in cache */
-	case -ENOBUFS:
+		/* page not yet cached */
 	case -ENODATA:
+		_debug("cache said ENODATA");
+		goto go_on;
+
+		/* page will not be cached */
+	case -ENOBUFS:
+		_debug("cache said ENOBUFS");
 	default:
+	go_on:
 		desc.fid	= vnode->fid;
 		desc.offset	= page->index << PAGE_CACHE_SHIFT;
 		desc.size	= min((size_t) (inode->i_size - desc.offset),
@@ -148,34 +184,40 @@ #endif
 		ret = afs_vnode_fetch_data(vnode, &desc);
 		kunmap(page);
 		if (ret < 0) {
-			if (ret==-ENOENT) {
-				_debug("got NOENT from server"
+			if (ret == -ENOENT) {
+				kdebug("got NOENT from server"
 				       " - marking file deleted and stale");
 				vnode->flags |= AFS_VNODE_DELETED;
 				ret = -ESTALE;
 			}
 
-#ifdef AFS_CACHING_SUPPORT
-			cachefs_uncache_page(vnode->cache, page);
+#ifdef CONFIG_AFS_FSCACHE
+			fscache_uncache_page(vnode->cache, page);
+			ClearPagePrivate(page);
 #endif
 			goto error;
 		}
 
 		SetPageUptodate(page);
 
-#ifdef AFS_CACHING_SUPPORT
-		if (cachefs_write_page(vnode->cache,
-				       page,
-				       afs_file_readpage_write_complete,
-				       NULL,
-				       GFP_KERNEL) != 0
-		    ) {
-			cachefs_uncache_page(vnode->cache, page);
-			unlock_page(page);
+		/* send the page to the cache */
+#ifdef CONFIG_AFS_FSCACHE
+		if (PagePrivate(page)) {
+			if (TestSetPageFsMisc(page))
+				BUG();
+			if (fscache_write_page(vnode->cache,
+					       page,
+					       afs_file_readpage_write_complete,
+					       NULL,
+					       GFP_KERNEL) != 0
+			    ) {
+				fscache_uncache_page(vnode->cache, page);
+				ClearPagePrivate(page);
+				end_page_fs_misc(page);
+			}
 		}
-#else
-		unlock_page(page);
 #endif
+		unlock_page(page);
 	}
 
 	_leave(" = 0");
@@ -187,87 +229,124 @@ #endif
 
 	_leave(" = %d", ret);
 	return ret;
-
-} /* end afs_file_readpage() */
+}
 
 /*****************************************************************************/
 /*
- * get a page cookie for the specified page
+ * read a set of pages
  */
-#ifdef AFS_CACHING_SUPPORT
-int afs_cache_get_page_cookie(struct page *page,
-			      struct cachefs_page **_page_cookie)
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+			      struct list_head *pages, unsigned nr_pages)
 {
-	int ret;
+	struct afs_vnode *vnode;
+	int ret = 0;
 
-	_enter("");
-	ret = cachefs_page_get_private(page,_page_cookie, GFP_NOIO);
+	_enter(",{%lu},,%d", mapping->host->i_ino, nr_pages);
 
-	_leave(" = %d", ret);
+	vnode = AFS_FS_I(mapping->host);
+	if (vnode->flags & AFS_VNODE_DELETED) {
+		_leave(" = -ESTALE");
+		return -ESTALE;
+	}
+
+	/* attempt to read as many of the pages as possible */
+	ret = fscache_read_or_alloc_pages(vnode->cache,
+					  mapping,
+					  pages,
+					  &nr_pages,
+					  afs_file_readpage_read_complete,
+					  NULL,
+					  mapping_gfp_mask(mapping));
+
+	switch (ret) {
+		/* all pages are being read from the cache */
+	case 0:
+		BUG_ON(!list_empty(pages));
+		BUG_ON(nr_pages != 0);
+		_leave(" = 0 [reading all]");
+		return 0;
+
+		/* there were pages that couldn't be read from the cache */
+	case -ENODATA:
+	case -ENOBUFS:
+		break;
+
+		/* other error */
+	default:
+		_leave(" = %d", ret);
+		return ret;
+	}
+
+	/* load the missing pages from the network */
+	ret = read_cache_pages(mapping, pages,
+			       (void *) afs_file_readpage, NULL);
+
+	_leave(" = %d [netting]", ret);
 	return ret;
-} /* end afs_cache_get_page_cookie() */
+}
 #endif
 
 /*****************************************************************************/
 /*
  * invalidate part or all of a page
+ * - release a page and clean up its private data if offset is 0 (indicating
+ *   the entire page)
  */
 static void afs_file_invalidatepage(struct page *page, unsigned long offset)
 {
-	int ret = 1;
-
 	_enter("{%lu},%lu", page->index, offset);
 
 	BUG_ON(!PageLocked(page));
 
 	if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
-		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-		cachefs_uncache_page(vnode->cache,page);
+		/* we clean up only if the entire page is being invalidated */
+		if (offset == 0 && !PageWriteback(page)) {
+#ifdef CONFIG_AFS_FSCACHE
+			wait_on_page_fs_misc(page);
+			fscache_uncache_page(
+				AFS_FS_I(page->mapping->host)->cache, page);
+			ClearPagePrivate(page);
 #endif
-
-		/* We release buffers only if the entire page is being
-		 * invalidated.
-		 * The get_block cached value has been unconditionally
-		 * invalidated, so real IO is not possible anymore.
-		 */
-		if (offset == 0) {
-			BUG_ON(!PageLocked(page));
-
-			ret = 0;
-			if (!PageWriteback(page))
-				ret = page->mapping->a_ops->releasepage(page,
-									0);
-			/* possibly should BUG_ON(!ret); - neilb */
 		}
 	}
 
-	_leave(" = %d", ret);
-} /* end afs_file_invalidatepage() */
+	_leave("");
+}
 
 /*****************************************************************************/
 /*
- * release a page and cleanup its private data
+ * release a page and clean up its private state if it's not busy
+ * - return true if the page can now be released, false if not
  */
 static int afs_file_releasepage(struct page *page, gfp_t gfp_flags)
 {
-	struct cachefs_page *pageio;
-
 	_enter("{%lu},%x", page->index, gfp_flags);
 
-	if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
-		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-		cachefs_uncache_page(vnode->cache, page);
-#endif
+#ifdef CONFIG_AFS_FSCACHE
+	/* deny if page is being written to the cache */
+	if (PageFsMisc(page)) {
+		_leave(" = F");
+		return 0;
+	}
 
-		pageio = (struct cachefs_page *) page_private(page);
-		set_page_private(page, 0);
-		ClearPagePrivate(page);
+	fscache_uncache_page(AFS_FS_I(page->mapping->host)->cache, page);
+#endif
 
-		kfree(pageio);
-	}
+	/* indicate that the page can be released */
+	_leave(" = T");
+	return 1;
+}
 
-	_leave(" = 0");
+/*****************************************************************************/
+/*
+ * wait for the disc cache to finish writing before permitting modification of
+ * our page in the page cache
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
 	return 0;
-} /* end afs_file_releasepage() */
+}
+#endif
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 61bc371..c88c41a 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -398,6 +398,8 @@ int afs_rxfs_fetch_file_status(struct af
 		bp++; /* spare6 */
 	}
 
+	_debug("Data Version %llx\n", vnode->status.version);
+
 	/* success */
 	ret = 0;
 
@@ -408,7 +410,7 @@ int afs_rxfs_fetch_file_status(struct af
  out_put_conn:
 	afs_server_release_callslot(server, &callslot);
  out:
-	_leave("");
+	_leave(" = %d", ret);
 	return ret;
 
  abort:
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 4ebb30a..0a59eda 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -49,7 +49,7 @@ static int afs_inode_map_status(struct a
 	case AFS_FTYPE_FILE:
 		inode->i_mode	= S_IFREG | vnode->status.mode;
 		inode->i_op	= &afs_file_inode_operations;
-		inode->i_fop	= &generic_ro_fops;
+		inode->i_fop	= &afs_file_file_operations;
 		break;
 	case AFS_FTYPE_DIR:
 		inode->i_mode	= S_IFDIR | vnode->status.mode;
@@ -65,6 +65,11 @@ static int afs_inode_map_status(struct a
 		return -EBADMSG;
 	}
 
+#ifdef CONFIG_AFS_FSCACHE
+	if (vnode->status.size != inode->i_size)
+		fscache_set_i_size(vnode->cache, vnode->status.size);
+#endif
+
 	inode->i_nlink		= vnode->status.nlink;
 	inode->i_uid		= vnode->status.owner;
 	inode->i_gid		= 0;
@@ -101,13 +106,33 @@ static int afs_inode_fetch_status(struct
 	struct afs_vnode *vnode;
 	int ret;
 
+	_enter("");
+
 	vnode = AFS_FS_I(inode);
 
 	ret = afs_vnode_fetch_status(vnode);
 
-	if (ret == 0)
+	if (ret == 0) {
+#ifdef CONFIG_AFS_FSCACHE
+		if (!vnode->cache) {
+			vnode->cache =
+				fscache_acquire_cookie(vnode->volume->cache,
+						       &afs_vnode_cache_index_def,
+						       vnode);
+			if (!vnode->cache)
+				printk("Negative\n");
+		}
+#endif
 		ret = afs_inode_map_status(vnode);
+#ifdef CONFIG_AFS_FSCACHE
+		if (ret < 0) {
+			fscache_relinquish_cookie(vnode->cache, 0);
+			vnode->cache = NULL;
+		}
+#endif
+	}
 
+	_leave(" = %d", ret);
 	return ret;
 
 } /* end afs_inode_fetch_status() */
@@ -122,6 +147,7 @@ static int afs_iget5_test(struct inode *
 
 	return inode->i_ino == data->fid.vnode &&
 		inode->i_version == data->fid.unique;
+
 } /* end afs_iget5_test() */
 
 /*****************************************************************************/
@@ -179,20 +205,11 @@ inline int afs_iget(struct super_block *
 		return ret;
 	}
 
-#ifdef AFS_CACHING_SUPPORT
-	/* set up caching before reading the status, as fetch-status reads the
-	 * first page of symlinks to see if they're really mntpts */
-	cachefs_acquire_cookie(vnode->volume->cache,
-			       NULL,
-			       vnode,
-			       &vnode->cache);
-#endif
-
 	/* okay... it's a new inode */
 	inode->i_flags |= S_NOATIME;
 	vnode->flags |= AFS_VNODE_CHANGED;
 	ret = afs_inode_fetch_status(inode);
-	if (ret<0)
+	if (ret < 0)
 		goto bad_inode;
 
 	/* success */
@@ -278,8 +295,8 @@ void afs_clear_inode(struct inode *inode
 
 	afs_vnode_give_up_callback(vnode);
 
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(vnode->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(vnode->cache, 0);
 	vnode->cache = NULL;
 #endif
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e88b3b6..482dbd1 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -16,15 +16,17 @@ #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/fscache.h>
 
 /*
  * debug tracing
  */
-#define kenter(FMT, a...)	printk("==> %s("FMT")\n",__FUNCTION__ , ## a)
-#define kleave(FMT, a...)	printk("<== %s()"FMT"\n",__FUNCTION__ , ## a)
-#define kdebug(FMT, a...)	printk(FMT"\n" , ## a)
-#define kproto(FMT, a...)	printk("### "FMT"\n" , ## a)
-#define knet(FMT, a...)		printk(FMT"\n" , ## a)
+#define __kdbg(FMT, a...)	printk("[%05d] "FMT"\n", current->pid , ## a)
+#define kenter(FMT, a...)	__kdbg("==> %s("FMT")", __FUNCTION__ , ## a)
+#define kleave(FMT, a...)	__kdbg("<== %s()"FMT, __FUNCTION__ , ## a)
+#define kdebug(FMT, a...)	__kdbg(FMT , ## a)
+#define kproto(FMT, a...)	__kdbg("### "FMT , ## a)
+#define knet(FMT, a...)		__kdbg(FMT , ## a)
 
 #ifdef __KDEBUG
 #define _enter(FMT, a...)	kenter(FMT , ## a)
@@ -56,9 +58,6 @@ static inline void afs_discard_my_signal
  */
 extern struct rw_semaphore afs_proc_cells_sem;
 extern struct list_head afs_proc_cells;
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_cache_cell_index_def;
-#endif
 
 /*
  * dir.c
@@ -71,11 +70,7 @@ extern const struct file_operations afs_
  */
 extern const struct address_space_operations afs_fs_aops;
 extern struct inode_operations afs_file_inode_operations;
-
-#ifdef AFS_CACHING_SUPPORT
-extern int afs_cache_get_page_cookie(struct page *page,
-				     struct cachefs_page **_page_cookie);
-#endif
+extern const struct file_operations afs_file_file_operations;
 
 /*
  * inode.c
@@ -97,8 +92,8 @@ #endif
 /*
  * main.c
  */
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_netfs afs_cache_netfs;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_netfs afs_cache_netfs;
 #endif
 
 /*
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 913c689..5840bb2 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -1,6 +1,6 @@
 /* main.c: AFS client file system
  *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2002,5 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
@@ -14,11 +14,11 @@ #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
+#include <linux/fscache.h>
 #include <rxrpc/rxrpc.h>
 #include <rxrpc/transport.h>
 #include <rxrpc/call.h>
 #include <rxrpc/peer.h>
-#include "cache.h"
 #include "cell.h"
 #include "server.h"
 #include "fsclient.h"
@@ -51,12 +51,11 @@ static struct rxrpc_peer_ops afs_peer_op
 struct list_head afs_cb_hash_tbl[AFS_CB_HASH_COUNT];
 DEFINE_SPINLOCK(afs_cb_hash_lock);
 
-#ifdef AFS_CACHING_SUPPORT
-static struct cachefs_netfs_operations afs_cache_ops = {
-	.get_page_cookie	= afs_cache_get_page_cookie,
+#ifdef CONFIG_AFS_FSCACHE
+static struct fscache_netfs_operations afs_cache_ops = {
 };
 
-struct cachefs_netfs afs_cache_netfs = {
+struct fscache_netfs afs_cache_netfs = {
 	.name			= "afs",
 	.version		= 0,
 	.ops			= &afs_cache_ops,
@@ -83,10 +82,9 @@ static int __init afs_init(void)
 	if (ret < 0)
 		return ret;
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* we want to be able to cache */
-	ret = cachefs_register_netfs(&afs_cache_netfs,
-				     &afs_cache_cell_index_def);
+	ret = fscache_register_netfs(&afs_cache_netfs);
 	if (ret < 0)
 		goto error;
 #endif
@@ -137,8 +135,8 @@ #ifdef CONFIG_KEYS_TURNED_OFF
 	afs_key_unregister();
  error_cache:
 #endif
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_unregister_netfs(&afs_cache_netfs);
  error:
 #endif
 	afs_cell_purge();
@@ -167,8 +165,8 @@ static void __exit afs_exit(void)
 #ifdef CONFIG_KEYS_TURNED_OFF
 	afs_key_unregister();
 #endif
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_unregister_netfs(&afs_cache_netfs);
 #endif
 	afs_proc_cleanup();
 
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 99785a7..2a53d51 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -78,7 +78,7 @@ int afs_mntpt_check_symlink(struct afs_v
 
 	ret = -EIO;
 	wait_on_page_locked(page);
-	buf = kmap(page);
+	buf = kmap_atomic(page, KM_USER0);
 	if (!PageUptodate(page))
 		goto out_free;
 	if (PageError(page))
@@ -101,7 +101,7 @@ int afs_mntpt_check_symlink(struct afs_v
 	ret = 0;
 
  out_free:
-	kunmap(page);
+	kunmap_atomic(buf, KM_USER0);
 	page_cache_release(page);
  out:
 	_leave(" = %d", ret);
@@ -188,9 +188,9 @@ static struct vfsmount *afs_mntpt_do_aut
 	if (!PageUptodate(page) || PageError(page))
 		goto error;
 
-	buf = kmap(page);
+	buf = kmap_atomic(page, KM_USER0);
 	memcpy(devname, buf, size);
-	kunmap(page);
+	kunmap_atomic(buf, KM_USER0);
 	page_cache_release(page);
 	page = NULL;
 
@@ -269,12 +269,12 @@ static void *afs_mntpt_follow_link(struc
  */
 static void afs_mntpt_expiry_timed_out(struct afs_timer *timer)
 {
-	kenter("");
+//	kenter("");
 
 	mark_mounts_for_expiry(&afs_vfsmounts);
 
 	afs_kafstimod_add_timer(&afs_mntpt_expiry_timer,
 				afs_mntpt_expiry_timeout * HZ);
 
-	kleave("");
+//	kleave("");
 } /* end afs_mntpt_expiry_timed_out() */
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 101d21b..db58488 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -177,6 +177,7 @@ int afs_proc_init(void)
  */
 void afs_proc_cleanup(void)
 {
+	remove_proc_entry("rootcell", proc_afs);
 	remove_proc_entry("cells", proc_afs);
 
 	remove_proc_entry("fs/afs", NULL);
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 22afaae..e94628c 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -375,7 +375,6 @@ int afs_server_request_callslot(struct a
 	else if (list_empty(&server->fs_callq)) {
 		/* no one waiting */
 		server->fs_conn_cnt[nconn]++;
-		spin_unlock(&server->fs_lock);
 	}
 	else {
 		/* someone's waiting - dequeue them and wake them up */
@@ -393,9 +392,9 @@ int afs_server_request_callslot(struct a
 		}
 		pcallslot->ready = 1;
 		wake_up_process(pcallslot->task);
-		spin_unlock(&server->fs_lock);
 	}
 
+	spin_unlock(&server->fs_lock);
 	rxrpc_put_connection(callslot->conn);
 	callslot->conn = NULL;
 
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 331f730..20148bc 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -59,17 +59,21 @@ static LIST_HEAD(afs_vlocation_update_pe
 static struct afs_vlocation *afs_vlocation_update;	/* VL currently being updated */
 static DEFINE_SPINLOCK(afs_vlocation_update_lock); /* lock guarding update queue */
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
-						     const void *entry);
-static void afs_vlocation_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vlocation_cache_index_def = {
-	.name		= "vldb",
-	.data_size	= sizeof(struct afs_cache_vlocation),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-	.match		= afs_vlocation_cache_match,
-	.update		= afs_vlocation_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+							const void *buffer,
+							uint16_t buflen);
+
+static struct fscache_cookie_def afs_vlocation_cache_index_def = {
+	.name		= "AFS.vldb",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_vlocation_cache_get_key,
+	.get_aux	= afs_vlocation_cache_get_aux,
+	.check_aux	= afs_vlocation_cache_check_aux,
 };
 #endif
 
@@ -300,13 +304,12 @@ int afs_vlocation_lookup(struct afs_cell
 
 	list_add_tail(&vlocation->link, &cell->vl_list);
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* we want to store it in the cache, plus it might already be
 	 * encached */
-	cachefs_acquire_cookie(cell->cache,
-			       &afs_volume_cache_index_def,
-			       vlocation,
-			       &vlocation->cache);
+	vlocation->cache = fscache_acquire_cookie(cell->cache,
+						  &afs_vlocation_cache_index_def,
+						  vlocation);
 
 	if (vlocation->valid)
 		goto found_in_cache;
@@ -340,7 +343,7 @@ #endif
  active:
 	active = 1;
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
  found_in_cache:
 #endif
 	/* try to look up a cached volume in the cell VL databases by ID */
@@ -422,9 +425,9 @@ #endif
 
 	afs_kafstimod_add_timer(&vlocation->upd_timer, 10 * HZ);
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* update volume entry in local cache */
-	cachefs_update_cookie(vlocation->cache);
+	fscache_update_cookie(vlocation->cache);
 #endif
 
 	*_vlocation = vlocation;
@@ -438,8 +441,8 @@ #endif
 		}
 		else {
 			list_del(&vlocation->link);
-#ifdef AFS_CACHING_SUPPORT
-			cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+			fscache_relinquish_cookie(vlocation->cache, 0);
 #endif
 			afs_put_cell(vlocation->cell);
 			kfree(vlocation);
@@ -536,8 +539,8 @@ void afs_vlocation_do_timeout(struct afs
 	}
 
 	/* we can now destroy it properly */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(vlocation->cache, 0);
 #endif
 	afs_put_cell(cell);
 
@@ -888,65 +891,103 @@ static void afs_vlocation_update_discard
 
 /*****************************************************************************/
 /*
- * match a VLDB record stored in the cache
- * - may also load target from entry
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
-						     const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vlocation *vldb = entry;
-	struct afs_vlocation *vlocation = target;
+	const struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%s},{%s}", vlocation->vldb.name, vldb->name);
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
 
-	if (strncmp(vlocation->vldb.name, vldb->name, sizeof(vldb->name)) == 0
-	    ) {
-		if (!vlocation->valid ||
-		    vlocation->vldb.rtime == vldb->rtime
-		    ) {
-			vlocation->vldb = *vldb;
-			vlocation->valid = 1;
-			_leave(" = SUCCESS [c->m]");
-			return CACHEFS_MATCH_SUCCESS;
-		}
-		/* need to update cache if cached info differs */
-		else if (memcmp(&vlocation->vldb, vldb, sizeof(*vldb)) != 0) {
-			/* delete if VIDs for this name differ */
-			if (memcmp(&vlocation->vldb.vid,
-				   &vldb->vid,
-				   sizeof(vldb->vid)) != 0) {
-				_leave(" = DELETE");
-				return CACHEFS_MATCH_SUCCESS_DELETE;
-			}
+	klen = strnlen(vlocation->vldb.name, sizeof(vlocation->vldb.name));
+	if (klen > bufmax)
+		return 0;
 
-			_leave(" = UPDATE");
-			return CACHEFS_MATCH_SUCCESS_UPDATE;
-		}
-		else {
-			_leave(" = SUCCESS");
-			return CACHEFS_MATCH_SUCCESS;
-		}
-	}
+	memcpy(buffer, vlocation->vldb.name, klen);
+
+	_leave(" = %u", klen);
+	return klen;
 
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_vlocation_cache_match() */
+} /* end afs_vlocation_cache_get_key() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a VLDB record stored in the cache
+ * provide new auxilliary cache data
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vlocation_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
 {
-	struct afs_cache_vlocation *vldb = entry;
-	struct afs_vlocation *vlocation = source;
+	const struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t dlen;
 
-	_enter("");
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
+
+	dlen = sizeof(struct afs_cache_vlocation);
+	dlen -= offsetof(struct afs_cache_vlocation, nservers);
+	if (dlen > bufmax)
+		return 0;
+
+	memcpy(buffer, (uint8_t *)&vlocation->vldb.nservers, dlen);
+
+	_leave(" = %u", dlen);
+	return dlen;
+
+} /* end afs_vlocation_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+							const void *buffer,
+							uint16_t buflen)
+{
+	const struct afs_cache_vlocation *cvldb;
+	struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, buflen);
+
+	/* check the size of the data is what we're expecting */
+	dlen = sizeof(struct afs_cache_vlocation);
+	dlen -= offsetof(struct afs_cache_vlocation, nservers);
+	if (dlen != buflen)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	cvldb = container_of(buffer, struct afs_cache_vlocation, nservers);
+
+	/* if what's on disk is more valid than what's in memory, then use the
+	 * VL record from the cache */
+	if (!vlocation->valid || vlocation->vldb.rtime == cvldb->rtime) {
+		memcpy((uint8_t *)&vlocation->vldb.nservers, buffer, dlen);
+		vlocation->valid = 1;
+		_leave(" = SUCCESS [c->m]");
+		return FSCACHE_CHECKAUX_OKAY;
+	}
+
+	/* need to update the cache if the cached info differs */
+	if (memcmp(&vlocation->vldb, buffer, dlen) != 0) {
+		/* delete if the volume IDs for this name differ */
+		if (memcmp(&vlocation->vldb.vid, &cvldb->vid,
+			   sizeof(cvldb->vid)) != 0
+		    ) {
+			_leave(" = OBSOLETE");
+			return FSCACHE_CHECKAUX_OBSOLETE;
+		}
+
+		_leave(" = UPDATE");
+		return FSCACHE_CHECKAUX_NEEDS_UPDATE;
+	}
 
-	*vldb = vlocation->vldb;
+	_leave(" = OKAY");
+	return FSCACHE_CHECKAUX_OKAY;
 
-} /* end afs_vlocation_cache_update() */
+} /* end afs_vlocation_cache_check_aux() */
 #endif
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index cf62da5..cd72674 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -29,17 +29,30 @@ struct afs_timer_ops afs_vnode_cb_timed_
 	.timed_out	= afs_vnode_cb_timed_out,
 };
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
-						 const void *entry);
-static void afs_vnode_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vnode_cache_index_def = {
-	.name		= "vnode",
-	.data_size	= sizeof(struct afs_cache_vnode),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 4 },
-	.match		= afs_vnode_cache_match,
-	.update		= afs_vnode_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+				     uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+					void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+						    const void *buffer,
+						    uint16_t buflen);
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+					      struct address_space *mapping,
+					      struct pagevec *cached_pvec);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+struct fscache_cookie_def afs_vnode_cache_index_def = {
+	.name			= "AFS.vnode",
+	.type			= FSCACHE_COOKIE_TYPE_DATAFILE,
+	.get_key		= afs_vnode_cache_get_key,
+	.get_attr		= afs_vnode_cache_get_attr,
+	.get_aux		= afs_vnode_cache_get_aux,
+	.check_aux		= afs_vnode_cache_check_aux,
+	.mark_pages_cached	= afs_vnode_cache_mark_pages_cached,
+	.now_uncached		= afs_vnode_cache_now_uncached,
 };
 #endif
 
@@ -188,6 +201,8 @@ int afs_vnode_fetch_status(struct afs_vn
 
 	if (vnode->update_cnt > 0) {
 		/* someone else started a fetch */
+		_debug("conflict");
+
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		add_wait_queue(&vnode->update_waitq, &myself);
 
@@ -219,6 +234,7 @@ int afs_vnode_fetch_status(struct afs_vn
 		spin_unlock(&vnode->lock);
 		set_current_state(TASK_RUNNING);
 
+		_leave(" [conflicted, %d", !!(vnode->flags & AFS_VNODE_DELETED));
 		return vnode->flags & AFS_VNODE_DELETED ? -ENOENT : 0;
 	}
 
@@ -341,54 +357,200 @@ int afs_vnode_give_up_callback(struct af
 
 /*****************************************************************************/
 /*
- * match a vnode record stored in the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
-						 const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vnode *cvnode = entry;
-	struct afs_vnode *vnode = target;
+	const struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%x,%x,%Lx},{%x,%x,%Lx}",
-	       vnode->fid.vnode,
-	       vnode->fid.unique,
-	       vnode->status.version,
-	       cvnode->vnode_id,
-	       cvnode->vnode_unique,
-	       cvnode->data_version);
-
-	if (vnode->fid.vnode != cvnode->vnode_id) {
-		_leave(" = FAILED");
-		return CACHEFS_MATCH_FAILED;
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, bufmax);
+
+	klen = sizeof(vnode->fid.vnode);
+	if (klen > bufmax)
+		return 0;
+
+	memcpy(buffer, &vnode->fid.vnode, sizeof(vnode->fid.vnode));
+
+	_leave(" = %u", klen);
+	return klen;
+
+} /* end afs_vnode_cache_get_key() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide an updated file attributes
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+				     uint64_t *size)
+{
+	const struct afs_vnode *vnode = cookie_netfs_data;
+
+	_enter("{%x,%x,%Lx},",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+	*size = i_size_read((struct inode *) &vnode->vfs_inode);
+
+} /* end afs_vnode_cache_get_attr() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide new auxilliary cache data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
+{
+	const struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, bufmax);
+
+	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+	if (dlen > bufmax)
+		return 0;
+
+	memcpy(buffer, &vnode->fid.unique, sizeof(vnode->fid.unique));
+	buffer += sizeof(vnode->fid.unique);
+	memcpy(buffer, &vnode->status.version, sizeof(vnode->status.version));
+
+	_leave(" = %u", dlen);
+	return dlen;
+
+} /* end afs_vnode_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+						    const void *buffer,
+						    uint16_t buflen)
+{
+	struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, buflen);
+
+	/* check the size of the data is what we're expecting */
+	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+	if (dlen != buflen) {
+		_leave(" = OBSOLETE [len %hx != %hx]", dlen, buflen);
+		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
-	if (vnode->fid.unique != cvnode->vnode_unique ||
-	    vnode->status.version != cvnode->data_version) {
-		_leave(" = DELETE");
-		return CACHEFS_MATCH_SUCCESS_DELETE;
+	if (memcmp(buffer,
+		   &vnode->fid.unique,
+		   sizeof(vnode->fid.unique)
+		   ) != 0
+	    ) {
+		unsigned unique;
+
+		memcpy(&unique, buffer, sizeof(unique));
+
+		_leave(" = OBSOLETE [uniq %x != %x]",
+		       unique, vnode->fid.unique);
+		return FSCACHE_CHECKAUX_OBSOLETE;
+	}
+
+	if (memcmp(buffer + sizeof(vnode->fid.unique),
+		   &vnode->status.version,
+		   sizeof(vnode->status.version)
+		   ) != 0
+	    ) {
+		afs_dataversion_t version;
+
+		memcpy(&version, buffer + sizeof(vnode->fid.unique),
+		       sizeof(version));
+
+		_leave(" = OBSOLETE [vers %llx != %llx]",
+		       version, vnode->status.version);
+		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
 	_leave(" = SUCCESS");
-	return CACHEFS_MATCH_SUCCESS;
-} /* end afs_vnode_cache_match() */
+	return FSCACHE_CHECKAUX_OKAY;
+
+} /* end afs_vnode_cache_check_aux() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a vnode record stored in the cache
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vnode_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+					      struct address_space *mapping,
+					      struct pagevec *cached_pvec)
 {
-	struct afs_cache_vnode *cvnode = entry;
-	struct afs_vnode *vnode = source;
+	unsigned long loop;
+
+	for (loop = 0; loop < cached_pvec->nr; loop++) {
+		struct page *page = cached_pvec->pages[loop];
 
-	_enter("");
+		_debug("- mark %p{%lx}", page, page->index);
 
-	cvnode->vnode_id	= vnode->fid.vnode;
-	cvnode->vnode_unique	= vnode->fid.unique;
-	cvnode->data_version	= vnode->status.version;
+		SetPagePrivate(page);
+	}
+
+} /* end afs_vnode_cache_mark_pages_cached() */
+#endif
+
+/*****************************************************************************/
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ *   is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data)
+{
+	struct afs_vnode *vnode = cookie_netfs_data;
+	struct pagevec pvec;
+	pgoff_t first;
+	int loop, nr_pages;
+
+	_enter("{%x,%x,%Lx}",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+	pagevec_init(&pvec, 0);
+	first = 0;
+
+	for (;;) {
+		/* grab a bunch of pages to clean */
+		nr_pages = pagevec_lookup(&pvec, vnode->vfs_inode.i_mapping,
+					  first,
+					  PAGEVEC_SIZE - pagevec_count(&pvec));
+		if (!nr_pages)
+			break;
+
+		for (loop = 0; loop < nr_pages; loop++)
+			ClearPagePrivate(pvec.pages[loop]);
+
+		first = pvec.pages[nr_pages - 1]->index + 1;
+
+		pvec.nr = nr_pages;
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+
+	_leave("");
 
-} /* end afs_vnode_cache_update() */
+} /* end afs_vnode_cache_now_uncached() */
 #endif
diff --git a/fs/afs/vnode.h b/fs/afs/vnode.h
index b86a971..3f0602d 100644
--- a/fs/afs/vnode.h
+++ b/fs/afs/vnode.h
@@ -13,9 +13,9 @@ #ifndef _LINUX_AFS_VNODE_H
 #define _LINUX_AFS_VNODE_H
 
 #include <linux/fs.h>
+#include <linux/fscache.h>
 #include "server.h"
 #include "kafstimod.h"
-#include "cache.h"
 
 #ifdef __KERNEL__
 
@@ -32,8 +32,8 @@ struct afs_cache_vnode
 	afs_dataversion_t	data_version;	/* data version */
 };
 
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vnode_cache_index_def;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_cookie_def afs_vnode_cache_index_def;
 #endif
 
 /*****************************************************************************/
@@ -47,8 +47,8 @@ struct afs_vnode
 	struct afs_volume	*volume;	/* volume on which vnode resides */
 	struct afs_fid		fid;		/* the file identifier for this inode */
 	struct afs_file_status	status;		/* AFS status info for this file */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 
 	wait_queue_head_t	update_waitq;	/* status fetch waitqueue */
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 0ff4b86..0bd5578 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -15,10 +15,10 @@ #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/fscache.h>
 #include "volume.h"
 #include "vnode.h"
 #include "cell.h"
-#include "cache.h"
 #include "cmservice.h"
 #include "fsclient.h"
 #include "vlclient.h"
@@ -28,18 +28,14 @@ #ifdef __KDEBUG
 static const char *afs_voltypes[] = { "R/W", "R/O", "BAK" };
 #endif
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
-						  const void *entry);
-static void afs_volume_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_volume_cache_index_def = {
-	.name		= "volume",
-	.data_size	= sizeof(struct afs_cache_vhash),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 1 },
-	.keys[1]	= { CACHEFS_INDEX_KEYS_BIN, 1 },
-	.match		= afs_volume_cache_match,
-	.update		= afs_volume_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+					 void *buffer, uint16_t buflen);
+
+static struct fscache_cookie_def afs_volume_cache_index_def = {
+	.name		= "AFS.volume",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_volume_cache_get_key,
 };
 #endif
 
@@ -214,11 +210,10 @@ int afs_volume_lookup(const char *name, 
 	}
 
 	/* attach the cache and volume location */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_acquire_cookie(vlocation->cache,
-			       &afs_vnode_cache_index_def,
-			       volume,
-			       &volume->cache);
+#ifdef CONFIG_AFS_FSCACHE
+	volume->cache = fscache_acquire_cookie(vlocation->cache,
+					       &afs_volume_cache_index_def,
+					       volume);
 #endif
 
 	afs_get_vlocation(vlocation);
@@ -286,8 +281,8 @@ void afs_put_volume(struct afs_volume *v
 	up_write(&vlocation->cell->vl_sem);
 
 	/* finish cleaning up the volume */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(volume->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(volume->cache, 0);
 #endif
 	afs_put_vlocation(vlocation);
 
@@ -481,40 +476,25 @@ int afs_volume_release_fileserver(struct
 
 /*****************************************************************************/
 /*
- * match a volume hash record stored in the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
-						  const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vhash *vhash = entry;
-	struct afs_volume *volume = target;
-
-	_enter("{%u},{%u}", volume->type, vhash->vtype);
+	const struct afs_volume *volume = cookie_netfs_data;
+	uint16_t klen;
 
-	if (volume->type == vhash->vtype) {
-		_leave(" = SUCCESS");
-		return CACHEFS_MATCH_SUCCESS;
-	}
-
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_volume_cache_match() */
-#endif
+	_enter("{%u},%p,%u", volume->type, buffer, bufmax);
 
-/*****************************************************************************/
-/*
- * update a volume hash record stored in the cache
- */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_volume_cache_update(void *source, void *entry)
-{
-	struct afs_cache_vhash *vhash = entry;
-	struct afs_volume *volume = source;
+	klen = sizeof(volume->type);
+	if (klen > bufmax)
+		return 0;
 
-	_enter("");
+	memcpy(buffer, &volume->type, sizeof(volume->type));
 
-	vhash->vtype = volume->type;
+	_leave(" = %u", klen);
+	return klen;
 
-} /* end afs_volume_cache_update() */
+} /* end afs_volume_cache_get_key() */
 #endif
diff --git a/fs/afs/volume.h b/fs/afs/volume.h
index bfdcf19..fc9895a 100644
--- a/fs/afs/volume.h
+++ b/fs/afs/volume.h
@@ -12,11 +12,11 @@
 #ifndef _LINUX_AFS_VOLUME_H
 #define _LINUX_AFS_VOLUME_H
 
+#include <linux/fscache.h>
 #include "types.h"
 #include "fsclient.h"
 #include "kafstimod.h"
 #include "kafsasyncd.h"
-#include "cache.h"
 
 typedef enum {
 	AFS_VLUPD_SLEEP,		/* sleeping waiting for update timer to fire */
@@ -45,24 +45,6 @@ #define AFS_VOL_VTM_BAK	0x04 /* backup v
 	time_t			rtime;		/* last retrieval time */
 };
 
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vlocation_cache_index_def;
-#endif
-
-/*****************************************************************************/
-/*
- * volume -> vnode hash table entry
- */
-struct afs_cache_vhash
-{
-	afs_voltype_t		vtype;		/* which volume variation */
-	uint8_t			hash_bucket;	/* which hash bucket this represents */
-} __attribute__((packed));
-
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_volume_cache_index_def;
-#endif
-
 /*****************************************************************************/
 /*
  * AFS volume location record
@@ -73,8 +55,8 @@ struct afs_vlocation
 	struct list_head	link;		/* link in cell volume location list */
 	struct afs_timer	timeout;	/* decaching timer */
 	struct afs_cell		*cell;		/* cell to which volume belongs */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 	struct afs_cache_vlocation vldb;	/* volume information DB record */
 	struct afs_volume	*vols[3];	/* volume access record pointer (index by type) */
@@ -109,8 +91,8 @@ struct afs_volume
 	atomic_t		usage;
 	struct afs_cell		*cell;		/* cell to which belongs (unrefd ptr) */
 	struct afs_vlocation	*vlocation;	/* volume location */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 	afs_volid_t		vid;		/* volume ID */
 	afs_voltype_t		type;		/* type of volume */

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 5/7] NFS: Use local caching [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
                   ` (2 preceding siblings ...)
  2006-08-30 19:32 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
@ 2006-08-30 19:32 ` David Howells
  2006-08-30 19:32 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
  2006-08-30 19:52 ` [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing " Andrew Morton
  5 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:32 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

	http://people.redhat.com/steved/cachefs/util-linux/

To mount an NFS filesystem to use caching, add an "fsc" option to the mount:

	mount warthog:/ /a -o fsc

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/Kconfig                 |    7 +
 fs/nfs/Makefile            |    1 
 fs/nfs/client.c            |   11 +
 fs/nfs/file.c              |   49 ++++-
 fs/nfs/fscache.c           |  348 ++++++++++++++++++++++++++++++++
 fs/nfs/fscache.h           |  479 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/inode.c             |   21 ++
 fs/nfs/internal.h          |   32 +++
 fs/nfs/pagelist.c          |    3 
 fs/nfs/read.c              |   30 +++
 fs/nfs/super.c             |    1 
 fs/nfs/sysctl.c            |   43 ++++
 fs/nfs/write.c             |   11 +
 include/linux/nfs4_mount.h |    1 
 include/linux/nfs_fs.h     |    5 
 include/linux/nfs_fs_sb.h  |    5 
 include/linux/nfs_mount.h  |    1 
 17 files changed, 1038 insertions(+), 10 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index eecc0ed..36d0051 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1485,6 +1485,13 @@ config NFS_V4
 
 	  If unsure, say N.
 
+config NFS_FSCACHE
+	bool "Provide NFS client caching support (EXPERIMENTAL)"
+	depends on NFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want NFS data to be cached locally on disc through
+	  the general filesystem cache manager
+
 config NFS_DIRECTIO
 	bool "Allow direct I/O on NFS files (EXPERIMENTAL)"
 	depends on NFS_FS && EXPERIMENTAL
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index f4580b4..2af6f22 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4x
 			   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
 nfs-objs		:= $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a4aa479..f2feebe 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -150,6 +150,8 @@ #ifdef CONFIG_NFS_V4
 	clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+	nfs_fscache_get_client_cookie(clp);
+
 	return clp;
 
 error_3:
@@ -193,6 +195,8 @@ static void nfs_free_client(struct nfs_c
 
 	nfs4_shutdown_client(clp);
 
+	nfs_fscache_release_client_cookie(clp);
+
 	/* -EIO all pending I/O */
 	if (!IS_ERR(clp->cl_rpcclient))
 		rpc_shutdown_client(clp->cl_rpcclient);
@@ -1371,7 +1375,7 @@ static int nfs_volume_list_show(struct s
 
 	/* display header on line 1 */
 	if (v == SEQ_START_TOKEN) {
-		seq_puts(m, "NV SERVER   PORT DEV     FSID\n");
+		seq_puts(m, "NV SERVER   PORT DEV     FSID              FSC\n");
 		return 0;
 	}
 	/* display one transport per line on subsequent lines */
@@ -1385,12 +1389,13 @@ static int nfs_volume_list_show(struct s
 		 (unsigned long long) server->fsid.major,
 		 (unsigned long long) server->fsid.minor);
 
-	seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+	seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
 		   clp->cl_nfsversion,
 		   NIPQUAD(clp->cl_addr.sin_addr),
 		   ntohs(clp->cl_addr.sin_port),
 		   dev,
-		   fsid);
+		   fsid,
+		   nfs_server_fscache_state(server));
 
 	return 0;
 }
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a146ed3..b7ab97c 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -27,12 +27,14 @@ #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
 #include <linux/smp_lock.h>
+#include <linux/buffer_head.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
 
 #include "delegation.h"
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_FILE
 
@@ -249,6 +251,10 @@ nfs_file_mmap(struct file * file, struct
 	status = nfs_revalidate_mapping(inode, file->f_mapping);
 	if (!status)
 		status = generic_file_mmap(file, vma);
+
+	if (status == 0)
+		nfs_fscache_install_vm_ops(inode, vma);
+
 	return status;
 }
 
@@ -301,6 +307,12 @@ static int nfs_commit_write(struct file 
 	return status;
 }
 
+/*
+ * partially or wholly invalidate a page
+ * - release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
 	struct inode *inode = page->mapping->host;
@@ -308,19 +320,47 @@ static void nfs_invalidate_page(struct p
 	/* Cancel any unstarted writes on this page */
 	if (offset == 0)
 		nfs_sync_inode_wait(inode, page->index, 1, FLUSH_INVALIDATE);
+
+	nfs_fscache_invalidate_page(page, inode, offset);
+
+	/* we can do this here as the bits are only set with the page lock
+	 * held, and our caller is holding that */
+	if (!page->private)
+		ClearPagePrivate(page);
 }
 
+/*
+ * release the private state associated with a page, if the page isn't busy
+ * - caller holds page lock
+ * - return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
-	if (gfp & __GFP_FS)
-		return !nfs_wb_page(page->mapping->host, page);
-	else
+	if ((gfp & __GFP_FS) == 0) {
 		/*
 		 * Avoid deadlock on nfs_wait_on_request().
 		 */
 		return 0;
+	}
+
+	if (nfs_wb_page(page->mapping->host, page) < 0)
+		return 0;
+
+	if (nfs_fscache_release_page(page) < 0)
+		return 0;
+
+	/* PG_private may have been set due to either caching or writing */
+	BUG_ON(page->private != 0);
+	ClearPagePrivate(page);
+
+	return 1;
 }
 
+/*
+ * Since we use page->private for our own nefarious purposes when using
+ * fscache, we have to override extra address space ops to prevent fs/buffer.c
+ * from getting confused, even though we may not have asked its opinion
+ */
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -334,6 +374,9 @@ const struct address_space_operations nf
 #ifdef CONFIG_NFS_DIRECTIO
 	.direct_IO = nfs_direct_IO,
 #endif
+#ifdef CONFIG_NFS_FSCACHE
+	.sync_page	= block_sync_page,
+#endif
 };
 
 /* 
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
new file mode 100644
index 0000000..94d5e3a
--- /dev/null
+++ b/fs/nfs/fscache.c
@@ -0,0 +1,348 @@
+/* fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/config.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+#include <linux/in6.h>
+
+#include "internal.h"
+
+/*
+ * Sysctl variables
+ */
+atomic_t nfs_fscache_to_pages;
+atomic_t nfs_fscache_from_pages;
+atomic_t nfs_fscache_uncache_page;
+int nfs_fscache_from_error;
+int nfs_fscache_to_error;
+
+#define NFSDBG_FACILITY		NFSDBG_FSCACHE
+
+/* the auxiliary data in the cache (used for coherency management) */
+struct nfs_fh_auxdata {
+	struct timespec	i_mtime;
+	struct timespec	i_ctime;
+	loff_t		i_size;
+};
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+	.name			= "nfs",
+	.version		= 0,
+	.ops			= &nfs_cache_ops,
+};
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+	[0 ... 9]	= 0x00,
+	[10 ... 11]	= 0xff
+};
+
+struct nfs_server_key {
+	uint16_t nfsversion;
+	uint16_t port;
+	union {
+		struct {
+			uint8_t		ipv6wrapper[12];
+			struct in_addr	addr;
+		} ipv4_addr;
+		struct in6_addr ipv6_addr;
+	};
+};
+
+static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
+				   void *buffer, uint16_t bufmax)
+{
+	const struct nfs_client *clp = cookie_netfs_data;
+	struct nfs_server_key *key = buffer;
+	uint16_t len = 0;
+
+	key->nfsversion = clp->cl_nfsversion;
+
+	switch (clp->cl_addr.sin_family) {
+	case AF_INET:
+		key->port = clp->cl_addr.sin_port;
+
+		memcpy(&key->ipv4_addr.ipv6wrapper,
+		       &nfs_cache_ipv6_wrapper_for_ipv4,
+		       sizeof(key->ipv4_addr.ipv6wrapper));
+		memcpy(&key->ipv4_addr.addr,
+		       &clp->cl_addr.sin_addr,
+		       sizeof(key->ipv4_addr.addr));
+		len = sizeof(struct nfs_server_key);
+		break;
+
+	case AF_INET6:
+		key->port = clp->cl_addr.sin_port;
+
+		memcpy(&key->ipv6_addr,
+		       &clp->cl_addr.sin_addr,
+		       sizeof(key->ipv6_addr));
+		len = sizeof(struct nfs_server_key);
+		break;
+
+	default:
+		len = 0;
+		printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
+			clp->cl_addr.sin_family);
+		break;
+	}
+
+	return len;
+}
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+struct fscache_cookie_def nfs_cache_server_index_def = {
+	.name		= "NFS.servers",
+	.type 		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= nfs_server_get_key,
+};
+
+static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
+		void *buffer, uint16_t bufmax)
+{
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+	uint16_t nsize;
+
+	/* set the file handle */
+	nsize = nfsi->fh.size;
+	memcpy(buffer, nfsi->fh.data, nsize);
+	return nsize;
+}
+
+/*
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
+ */
+static void nfs_fh_mark_pages_cached(void *cookie_netfs_data,
+				     struct address_space *mapping,
+				     struct pagevec *cached_pvec)
+{
+	struct nfs_inode *nfsi = cookie_netfs_data;
+	unsigned long loop;
+
+	dprintk("NFS: nfs_fh_mark_pages_cached: nfs_inode 0x%p pages %ld\n",
+		nfsi, cached_pvec->nr);
+
+	BUG_ON(!nfsi->fscache);
+
+	for (loop = 0; loop < cached_pvec->nr; loop++)
+		SetPageNfsCached(cached_pvec->pages[loop]);
+}
+
+/*
+ * get an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ *   require a context
+ */
+static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
+{
+	get_nfs_open_context(context);
+}
+
+/*
+ * release an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ *   require a context
+ */
+static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
+{
+	if (context)
+		put_nfs_open_context(context);
+}
+
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ *   is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+static void nfs_fh_now_uncached(void *cookie_netfs_data)
+{
+	struct nfs_inode *nfsi = cookie_netfs_data;
+	struct pagevec pvec;
+	pgoff_t first;
+	int loop, nr_pages;
+
+	pagevec_init(&pvec, 0);
+	first = 0;
+
+	dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
+
+	for (;;) {
+		/* grab a bunch of pages to clean */
+		nr_pages = pagevec_lookup(&pvec,
+					  nfsi->vfs_inode.i_mapping,
+					  first,
+					  PAGEVEC_SIZE - pagevec_count(&pvec));
+		if (!nr_pages)
+			break;
+
+		for (loop = 0; loop < nr_pages; loop++)
+			ClearPageNfsCached(pvec.pages[loop]);
+
+		first = pvec.pages[nr_pages - 1]->index + 1;
+
+		pvec.nr = nr_pages;
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+}
+
+/*
+ * get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ *   presented
+ */
+static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
+{
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+
+	*size = nfsi->vfs_inode.i_size;
+}
+
+/*
+ * get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ *   presented
+ */
+static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
+			       void *buffer, uint16_t bufmax)
+{
+	struct nfs_fh_auxdata auxdata;
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+
+	auxdata.i_size = nfsi->vfs_inode.i_size;
+	auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+	auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+	if (bufmax > sizeof(auxdata))
+		bufmax = sizeof(auxdata);
+
+	memcpy(buffer, &auxdata, bufmax);
+	return bufmax;
+}
+
+/*
+ * consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ *   presented, as is the auxilliary data
+ */
+static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
+					   const void *data, uint16_t datalen)
+{
+	struct nfs_fh_auxdata auxdata;
+	struct nfs_inode *nfsi = cookie_netfs_data;
+
+	if (datalen > sizeof(auxdata))
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	auxdata.i_size = nfsi->vfs_inode.i_size;
+	auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+	auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+	if (memcmp(data, &auxdata, datalen) != 0)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+struct fscache_cookie_def nfs_cache_fh_index_def = {
+	.name			= "NFS.fh",
+	.type			= FSCACHE_COOKIE_TYPE_DATAFILE,
+	.get_key		= nfs_fh_get_key,
+	.get_attr		= nfs_fh_get_attr,
+	.get_aux		= nfs_fh_get_aux,
+	.check_aux		= nfs_fh_check_aux,
+	.get_context		= nfs_fh_get_context,
+	.put_context		= nfs_fh_put_context,
+	.mark_pages_cached	= nfs_fh_mark_pages_cached,
+	.now_uncached		= nfs_fh_now_uncached,
+};
+
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+	.nopage		= filemap_nopage,
+	.populate	= filemap_populate,
+	.page_mkwrite	= nfs_file_page_mkwrite,
+};
+
+/*
+ * handle completion of a page being stored in the cache
+ */
+void nfs_readpage_to_fscache_complete(struct page *page, void *data, int error)
+{
+	dfprintk(FSCACHE,
+		"NFS:     readpage_to_fscache_complete (p:%p(i:%lx f:%lx)/%d)\n",
+		page, page->index, page->flags, error);
+
+	end_page_fs_misc(page);
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - called in process (keventd) context
+ */
+void nfs_readpage_from_fscache_complete(struct page *page,
+					void *context,
+					int error)
+{
+	dfprintk(FSCACHE,
+		 "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
+		 page, context, error);
+
+	/* if the read completes with an error, we just unlock the page and let
+	 * the VM reissue the readpage */
+	if (!error) {
+		SetPageUptodate(page);
+		unlock_page(page);
+	} else {
+		error = nfs_readpage_async(context, page->mapping->host, page);
+		if (error)
+			unlock_page(page);
+	}
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - really need to synchronise the end of writeback, probably using a page
+ *   flag, but for the moment we disable caching on writable files
+ */
+void nfs_writepage_to_fscache_complete(struct page *page,
+				       void *data,
+				       int error)
+{
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
new file mode 100644
index 0000000..8899f16
--- /dev/null
+++ b/fs/nfs/fscache.h
@@ -0,0 +1,479 @@
+/* fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#include <linux/fscache.h>
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_cookie_def nfs_cache_server_index_def;
+extern struct fscache_cookie_def nfs_cache_fh_index_def;
+extern struct vm_operations_struct nfs_fs_vm_operations;
+
+extern void nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, gfp_t);
+
+extern atomic_t nfs_fscache_to_pages;
+extern atomic_t nfs_fscache_from_pages;
+extern atomic_t nfs_fscache_uncache_page;
+extern int nfs_fscache_from_error;
+extern int nfs_fscache_to_error;
+
+/*
+ * register NFS for caching
+ */
+static inline int nfs_fscache_register(void)
+{
+	return fscache_register_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * unregister NFS for caching
+ */
+static inline void nfs_fscache_unregister(void)
+{
+	fscache_unregister_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * get the per-client index cookie for an NFS client if the appropriate mount
+ * flag was set
+ * - we always try and get an index cookie for the client, but get filehandle
+ *   cookies on a per-superblock basis, depending on the mount flags
+ */
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
+{
+	/* create a cache index for looking up filehandles */
+	clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+					      &nfs_cache_server_index_def,
+					      clp);
+	dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
+		 clp, clp->fscache);
+}
+
+/*
+ * dispose of a per-client cookie
+ */
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
+{
+	dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
+		clp, clp->fscache);
+
+	fscache_relinquish_cookie(clp->fscache, 0);
+	clp->fscache = NULL;
+}
+
+/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+	if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
+		return "yes";
+	return "no ";
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_get_fh_cookie(struct super_block *sb,
+					     struct nfs_inode *nfsi,
+					     int maycache)
+{
+	nfsi->fscache = NULL;
+	if (maycache && (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
+		nfsi->fscache = fscache_acquire_cookie(
+			NFS_SB(sb)->nfs_client->fscache,
+			&nfs_cache_fh_index_def,
+			nfsi);
+
+		fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+		dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
+			 sb, nfsi, nfsi->fscache);
+	}
+}
+
+/*
+ * change the filesize associated with a per-filehandle cookie
+ */
+static inline void nfs_fscache_set_size(struct nfs_server *server,
+					struct nfs_inode *nfsi,
+					loff_t i_size)
+{
+	fscache_set_i_size(nfsi->fscache, i_size);
+}
+
+/*
+ * replace a per-filehandle cookie due to revalidation detecting a file having
+ * changed on the server
+ */
+static inline void nfs_fscache_renew_fh_cookie(struct nfs_server *server,
+					       struct nfs_inode *nfsi)
+{
+	struct fscache_cookie *old = nfsi->fscache;
+
+	if (nfsi->fscache) {
+		/* retire the current fscache cache and get a new one */
+		fscache_relinquish_cookie(nfsi->fscache, 1);
+
+		nfsi->fscache = fscache_acquire_cookie(
+			server->nfs_client->fscache,
+			&nfs_cache_fh_index_def,
+			nfsi);
+		fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+		dfprintk(FSCACHE,
+			 "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+			 server, nfsi, old, nfsi->fscache);
+	}
+}
+
+/*
+ * release a per-filehandle cookie
+ */
+static inline void nfs_fscache_release_fh_cookie(struct nfs_server *server,
+						 struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+		 nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 0);
+	nfsi->fscache = NULL;
+}
+
+/*
+ * retire a per-filehandle cookie, destroying the data attached to it
+ */
+static inline void nfs_fscache_zap_fh_cookie(struct nfs_server *server,
+					     struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+		nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = NULL;
+}
+
+/*
+ * turn off the cache with regard to a filehandle cookie if opened for writing,
+ * invalidating all the pages in the page cache relating to the associated
+ * inode to clear the per-page caching
+ */
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
+{
+	if (NFS_I(inode)->fscache) {
+		dfprintk(FSCACHE,
+			 "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
+
+		/* Need to invalided any mapped pages that were read in before
+		 * turning off the cache.
+		 */
+		if (inode->i_mapping && inode->i_mapping->nrpages)
+			invalidate_inode_pages2(inode->i_mapping);
+
+		nfs_fscache_zap_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
+	}
+}
+
+/*
+ * install the VM ops for mmap() of an NFS file so that we can hold up writes
+ * to pages on shared writable mappings until the store to the cache is
+ * complete
+ */
+static inline void nfs_fscache_install_vm_ops(struct inode *inode,
+					      struct vm_area_struct *vma)
+{
+	if (NFS_I(inode)->fscache)
+		vma->vm_ops = &nfs_fs_vm_operations;
+}
+
+/*
+ * release the caching state associated with a page, if the page isn't busy
+ * interacting with the cache
+ */
+static inline int nfs_fscache_release_page(struct page *page)
+{
+	if (PageFsMisc(page))
+		return -EBUSY;
+
+	if (PageNfsCached(page)) {
+		struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+		BUG_ON(!nfsi->fscache);
+
+		dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+			 nfsi->fscache, page, nfsi);
+
+		fscache_uncache_page(nfsi->fscache, page);
+		atomic_inc(&nfs_fscache_uncache_page);
+		ClearPageNfsCached(page);
+	}
+
+	return 0;
+}
+
+/*
+ * release the caching state associated with a page if undergoing complete page
+ * invalidation
+ */
+static inline void nfs_fscache_invalidate_page(struct page *page,
+					       struct inode *inode,
+					       unsigned long offset)
+{
+	struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+	if (PageNfsCached(page)) {
+		BUG_ON(!nfsi->fscache);
+
+		dfprintk(FSCACHE,
+			 "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+			 nfsi->fscache, page, nfsi);
+
+		wait_on_page_fs_misc(page);
+
+		if (offset == 0) {
+			BUG_ON(!PageLocked(page));
+			if (!PageWriteback(page)) {
+				fscache_uncache_page(nfsi->fscache, page);
+				atomic_inc(&nfs_fscache_uncache_page);
+				ClearPageNfsCached(page);
+			}
+		}
+	}
+}
+
+/*
+ * store a newly fetched page in fscache
+ */
+extern void nfs_readpage_to_fscache_complete(struct page *, void *, int);
+
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+					   struct page *page,
+					   int sync)
+{
+	int ret;
+
+	if (PageNfsCached(page)) {
+		dfprintk(FSCACHE,
+			 "NFS: "
+			 "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
+			 NFS_I(inode)->fscache, page, page->index, page->flags,
+			 sync);
+
+		if (TestSetPageFsMisc(page))
+			BUG();
+
+		ret = fscache_write_page(NFS_I(inode)->fscache, page,
+					 nfs_readpage_to_fscache_complete,
+					 NULL, GFP_KERNEL);
+		dfprintk(FSCACHE,
+			 "NFS:     "
+			 "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
+			 page, page->index, page->flags, ret);
+
+		if (ret != 0) {
+			fscache_uncache_page(NFS_I(inode)->fscache, page);
+			atomic_inc(&nfs_fscache_uncache_page);
+			ClearPageNfsCached(page);
+			end_page_fs_misc(page);
+			nfs_fscache_to_error = ret;
+		} else {
+			atomic_inc(&nfs_fscache_to_pages);
+		}
+	}
+}
+
+/*
+ * retrieve a page from fscache
+ */
+extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
+
+static inline
+int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+			      struct inode *inode,
+			      struct page *page)
+{
+	int ret;
+
+	if (!NFS_I(inode)->fscache)
+		return 1;
+
+	dfprintk(FSCACHE,
+		 "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
+		 NFS_I(inode)->fscache, page, page->index, page->flags, inode);
+
+	ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+					 page,
+					 nfs_readpage_from_fscache_complete,
+					 ctx,
+					 GFP_KERNEL);
+
+	switch (ret) {
+	case 0: /* read BIO submitted (page in fscache) */
+		dfprintk(FSCACHE,
+			 "NFS:    readpage_from_fscache: BIO submitted\n");
+		atomic_inc(&nfs_fscache_from_pages);
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dfprintk(FSCACHE,
+			 "NFS:    readpage_from_fscache error %d\n", ret);
+		return 1;
+
+	default:
+		dfprintk(FSCACHE, "NFS:    readpage_from_fscache %d\n", ret);
+		nfs_fscache_from_error = ret;
+	}
+	return ret;
+}
+
+/*
+ * retrieve a set of pages from fscache
+ */
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+					     struct inode *inode,
+					     struct address_space *mapping,
+					     struct list_head *pages,
+					     unsigned *nr_pages)
+{
+	int ret, npages = *nr_pages;
+
+	if (!NFS_I(inode)->fscache)
+		return 1;
+
+	dfprintk(FSCACHE,
+		 "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
+		 NFS_I(inode)->fscache, *nr_pages, inode);
+
+	ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
+					  mapping, pages, nr_pages,
+					  nfs_readpage_from_fscache_complete,
+					  ctx,
+					  mapping_gfp_mask(mapping));
+
+
+	switch (ret) {
+	case 0: /* read BIO submitted (page in fscache) */
+		BUG_ON(!list_empty(pages));
+		BUG_ON(*nr_pages != 0);
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: BIO submitted\n");
+
+		atomic_add(npages, &nfs_fscache_from_pages);
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
+		return 1;
+
+	default:
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: ret  %d\n", ret);
+		nfs_fscache_from_error = ret;
+	}
+
+	return ret;
+}
+
+/*
+ * store an updated page in fscache
+ */
+extern void nfs_writepage_to_fscache_complete(struct page *page, void *data, int error);
+
+static inline void nfs_writepage_to_fscache(struct inode *inode,
+					    struct page *page)
+{
+	int error;
+
+	if (PageNfsCached(page) && NFS_I(inode)->fscache) {
+		dfprintk(FSCACHE,
+			 "NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
+			 NFS_I(inode)->fscache, page, inode);
+
+		error = fscache_write_page(NFS_I(inode)->fscache, page,
+					   nfs_writepage_to_fscache_complete,
+					   NULL, GFP_KERNEL);
+		if (error != 0) {
+			dfprintk(FSCACHE,
+				 "NFS:    fscache_write_page error %d\n",
+				 error);
+			fscache_uncache_page(NFS_I(inode)->fscache, page);
+		}
+	}
+}
+
+#else /* CONFIG_NFS_FSCACHE */
+static inline int nfs_fscache_register(void) { return 0; }
+static inline void nfs_fscache_unregister(void) {}
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
+
+static inline void nfs_fscache_get_fh_cookie(struct super_block *sb,
+					     struct nfs_inode *nfsi,
+					     int maycache)
+{
+}
+static inline void nfs_fscache_set_size(struct nfs_server *server,
+					struct nfs_inode *nfsi,
+					loff_t i_size)
+{
+}
+static inline void nfs_fscache_release_fh_cookie(struct nfs_server *server,
+						 struct nfs_inode *nfsi)
+{
+}
+static inline void nfs_fscache_zap_fh_cookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline void nfs_fscache_renew_fh_cookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
+static inline int nfs_fscache_release_page(struct page *page)
+{
+	return 1; /* True: may release page */
+}
+static inline void nfs_fscache_invalidate_page(struct page *page,
+					       struct inode *inode,
+					       unsigned long offset)
+{
+}
+static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
+static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+					    struct inode *inode, struct page *page)
+{
+	return -ENOBUFS;
+}
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+					     struct inode *inode,
+					     struct address_space *mapping,
+					     struct list_head *pages,
+					     unsigned *nr_pages)
+{
+	return -ENOBUFS;
+}
+
+static inline void nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	BUG_ON(PageNfsCached(page));
+}
+
+#endif /* CONFIG_NFS_FSCACHE */
+#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c7d34b7..a7b1e20 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -78,6 +78,7 @@ void nfs_clear_inode(struct inode *inode
 	BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
 	nfs_zap_acl_cache(inode);
 	nfs_access_zap_cache(inode);
+	nfs_fscache_release_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
 }
 
 /**
@@ -123,6 +124,8 @@ void nfs_zap_caches(struct inode *inode)
 	spin_lock(&inode->i_lock);
 	nfs_zap_caches_locked(inode);
 	spin_unlock(&inode->i_lock);
+
+	nfs_fscache_zap_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
 }
 
 static void nfs_zap_acl_cache(struct inode *inode)
@@ -201,6 +204,7 @@ nfs_fhget(struct super_block *sb, struct
 	};
 	struct inode *inode = ERR_PTR(-ENOENT);
 	unsigned long hash;
+	int maycache = 1;
 
 	if ((fattr->valid & NFS_ATTR_FATTR) == 0)
 		goto out_no_inode;
@@ -252,6 +256,7 @@ nfs_fhget(struct super_block *sb, struct
 				else
 					inode->i_op = &nfs_mountpoint_inode_operations;
 				inode->i_fop = NULL;
+				maycache = 0;
 			}
 		} else if (S_ISLNK(inode->i_mode))
 			inode->i_op = &nfs_symlink_inode_operations;
@@ -284,6 +289,8 @@ nfs_fhget(struct super_block *sb, struct
 		memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
 		nfsi->access_cache = RB_ROOT;
 
+		nfs_fscache_get_fh_cookie(sb, nfsi, maycache);
+
 		unlock_new_inode(inode);
 	} else
 		nfs_refresh_inode(inode, fattr);
@@ -366,6 +373,7 @@ void nfs_setattr_update_inode(struct ino
 	if ((attr->ia_valid & ATTR_SIZE) != 0) {
 		nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
 		inode->i_size = attr->ia_size;
+		nfs_fscache_set_size(NFS_SERVER(inode), NFS_I(inode), inode->i_size);
 		vmtruncate(inode, attr->ia_size);
 	}
 }
@@ -550,6 +558,8 @@ int nfs_open(struct inode *inode, struct
 	ctx->mode = filp->f_mode;
 	nfs_file_set_open_context(filp, ctx);
 	put_nfs_open_context(ctx);
+	if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+		nfs_fscache_disable_fh_cookie(inode);
 	return 0;
 }
 
@@ -688,6 +698,8 @@ int nfs_revalidate_mapping(struct inode 
 		}
 		spin_unlock(&inode->i_lock);
 
+		nfs_fscache_renew_fh_cookie(NFS_SERVER(inode), nfsi);
+
 		dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
 				inode->i_sb->s_id,
 				(long long)NFS_FILEID(inode));
@@ -921,11 +933,13 @@ static int nfs_update_inode(struct inode
 			if (data_stable) {
 				inode->i_size = new_isize;
 				invalid |= NFS_INO_INVALID_DATA;
+				nfs_fscache_set_size(NFS_SERVER(inode), nfsi, inode->i_size);
 			}
 			invalid |= NFS_INO_INVALID_ATTR;
 		} else if (new_isize > cur_isize) {
 			inode->i_size = new_isize;
 			invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+			nfs_fscache_set_size(NFS_SERVER(inode), nfsi, inode->i_size);
 		}
 		nfsi->cache_change_attribute = jiffies;
 		dprintk("NFS: isize change on server for file %s/%ld\n",
@@ -1140,6 +1154,10 @@ static int __init init_nfs_fs(void)
 {
 	int err;
 
+	err = nfs_fscache_register();
+	if (err < 0)
+		goto out6;
+
 	err = nfs_fs_proc_init();
 	if (err)
 		goto out5;
@@ -1186,6 +1204,8 @@ out3:
 out4:
 	nfs_fs_proc_exit();
 out5:
+	nfs_fscache_unregister();
+out6:
 	return err;
 }
 
@@ -1196,6 +1216,7 @@ static void __exit exit_nfs_fs(void)
 	nfs_destroy_readpagecache();
 	nfs_destroy_inodecache();
 	nfs_destroy_nfspagecache();
+	nfs_fscache_unregister();
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index bea0b01..2fce9bd 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -4,6 +4,30 @@
 
 #include <linux/mount.h>
 
+#define NFS_PAGE_WRITING	0
+#define NFS_PAGE_CACHED		1
+
+#define PageNfsBit(bit, page)		test_bit(bit, &(page)->private)
+
+#define SetPageNfsBit(bit, page)		\
+do {						\
+	SetPagePrivate((page));			\
+	set_bit(bit, &(page)->private);		\
+} while(0)
+
+#define ClearPageNfsBit(bit, page)		\
+do {						\
+	clear_bit(bit, &(page)->private);	\
+} while(0)
+
+#define PageNfsWriting(page)		PageNfsBit(NFS_PAGE_WRITING, (page))
+#define SetPageNfsWriting(page)		SetPageNfsBit(NFS_PAGE_WRITING, (page))
+#define ClearPageNfsWriting(page)	ClearPageNfsBit(NFS_PAGE_WRITING, (page))
+
+#define PageNfsCached(page)		PageNfsBit(NFS_PAGE_CACHED, (page))
+#define SetPageNfsCached(page)		SetPageNfsBit(NFS_PAGE_CACHED, (page))
+#define ClearPageNfsCached(page)	ClearPageNfsBit(NFS_PAGE_CACHED, (page))
+
 struct nfs_string;
 struct nfs_mount_data;
 struct nfs4_mount_data;
@@ -27,6 +51,11 @@ struct nfs_clone_mount {
 	rpc_authflavor_t authflavor;
 };
 
+/*
+ * include filesystem caching stuff here
+ */
+#include "fscache.h"
+
 /* client.c */
 extern struct rpc_program nfs_program;
 
@@ -153,6 +182,9 @@ extern int nfs4_path_walk(struct nfs_ser
 			  const char *path);
 #endif
 
+/* read.c */
+extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
+
 /*
  * Determine the device name as a string
  */
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 36e902a..c45f724 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -17,6 +17,7 @@ #include <linux/nfs4.h>
 #include <linux/nfs_page.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_mount.h>
+#include "internal.h"
 
 #define NFS_PARANOIA 1
 
@@ -84,7 +85,7 @@ nfs_create_request(struct nfs_open_conte
 	atomic_set(&req->wb_complete, 0);
 	req->wb_index	= page->index;
 	page_cache_get(page);
-	BUG_ON(PagePrivate(page));
+	BUG_ON(PageNfsWriting(page));
 	BUG_ON(!PageLocked(page));
 	BUG_ON(page->mapping->host != inode);
 	req->wb_offset  = offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index b3be058..253d2a8 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -26,11 +26,13 @@ #include <linux/pagemap.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
 #include <linux/smp_lock.h>
 
 #include <asm/system.h>
 
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -207,13 +209,18 @@ static int nfs_readpage_sync(struct nfs_
 		SetPageUptodate(page);
 	result = 0;
 
+	nfs_readpage_to_fscache(inode, page, 1);
+	unlock_page(page);
+
+	return result;
+
 io_error:
 	unlock_page(page);
 	nfs_readdata_free(rdata);
 	return result;
 }
 
-static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
+int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 		struct page *page)
 {
 	LIST_HEAD(one_request);
@@ -238,6 +245,11 @@ static int nfs_readpage_async(struct nfs
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
+	struct inode *d_inode = req->wb_context->dentry->d_inode;
+
+	if (PageUptodate(req->wb_page))
+		nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+
 	unlock_page(req->wb_page);
 
 	dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
@@ -620,6 +632,10 @@ int nfs_readpage(struct file *file, stru
 		ctx = get_nfs_open_context((struct nfs_open_context *)
 				file->private_data);
 	if (!IS_SYNC(inode)) {
+		error = nfs_readpage_from_fscache(ctx, inode, page);
+		if (error == 0)
+			goto out;
+
 		error = nfs_readpage_async(ctx, inode, page);
 		goto out;
 	}
@@ -650,6 +666,7 @@ readpage_async_filler(void *data, struct
 	unsigned int len;
 
 	nfs_wb_page(inode, page);
+
 	len = nfs_page_length(inode, page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
@@ -689,6 +706,17 @@ int nfs_readpages(struct file *filp, str
 	} else
 		desc.ctx = get_nfs_open_context((struct nfs_open_context *)
 				filp->private_data);
+
+	/* attempt to read as many of the pages as possible from the cache
+	 * - this returns -ENOBUFS immediately if the cookie is negative
+	 */
+	ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
+					 pages, &nr_pages);
+	if (ret == 0) {
+		put_nfs_open_context(desc.ctx);
+		return ret; /* all read */
+	}
+
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 	if (!list_empty(&head)) {
 		int err = nfs_pagein_list(&head, server->rpages);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 665949d..f70ea2c 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -291,6 +291,7 @@ static void nfs_show_mount_options(struc
 		{ NFS_MOUNT_NOAC, ",noac", "" },
 		{ NFS_MOUNT_NONLM, ",nolock", "" },
 		{ NFS_MOUNT_NOACL, ",noacl", "" },
+		{ NFS_MOUNT_FSCACHE, ",fsc", "" },
 		{ 0, NULL, NULL }
 	};
 	const struct proc_nfs_info *nfs_infop;
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 2fe3403..7a25a6d 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -14,6 +14,7 @@ #include <linux/nfs_idmap.h>
 #include <linux/nfs_fs.h>
 
 #include "callback.h"
+#include "internal.h"
 
 static const int nfs_set_port_min = 0;
 static const int nfs_set_port_max = 65535;
@@ -55,6 +56,48 @@ #endif
 		.proc_handler	= &proc_dointvec_jiffies,
 		.strategy	= &sysctl_jiffies,
 	},
+#ifdef CONFIG_NFS_FSCACHE
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_from_error",
+		.data = &nfs_fscache_from_error,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_to_error",
+		.data = &nfs_fscache_to_error,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_uncache_page",
+		.data = &nfs_fscache_uncache_page,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_to_pages",
+		.data = &nfs_fscache_to_pages,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec_minmax,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_from_pages",
+		.data = &nfs_fscache_from_pages,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+#endif
 	{ .ctl_name = 0 }
 };
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1f7a6d4..e10d29f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -63,6 +63,7 @@ #include <linux/smp_lock.h>
 
 #include "delegation.h"
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -163,6 +164,9 @@ static void nfs_grow_file(struct page *p
 		return;
 	nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
 	i_size_write(inode, end);
+#ifdef FSCACHE_WRITE_SUPPORT
+	nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
+#endif
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -342,6 +346,9 @@ do_it:
 		err = -EBADF;
 		goto out;
 	}
+
+	nfs_writepage_to_fscache(inode, page);
+
 	lock_kernel();
 	if (!IS_SYNC(inode) && inode_referenced) {
 		err = nfs_writepage_async(ctx, inode, page, 0, offset);
@@ -425,7 +432,7 @@ static int nfs_inode_add_request(struct 
 		if (nfs_have_delegation(inode, FMODE_WRITE))
 			nfsi->change_attr++;
 	}
-	SetPagePrivate(req->wb_page);
+	SetPageNfsWriting(req->wb_page);
 	nfsi->npages++;
 	atomic_inc(&req->wb_count);
 	return 0;
@@ -442,7 +449,7 @@ static void nfs_inode_remove_request(str
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&nfsi->req_lock);
-	ClearPagePrivate(req->wb_page);
+	ClearPageNfsWriting(req->wb_page);
 	radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
 	nfsi->npages--;
 	if (!nfsi->npages) {
diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
index 26b4c83..15199cc 100644
--- a/include/linux/nfs4_mount.h
+++ b/include/linux/nfs4_mount.h
@@ -65,6 +65,7 @@ #define NFS4_MOUNT_INTR		0x0002	/* 1 */
 #define NFS4_MOUNT_NOCTO	0x0010	/* 1 */
 #define NFS4_MOUNT_NOAC		0x0020	/* 1 */
 #define NFS4_MOUNT_STRICTLOCK	0x1000	/* 1 */
+#define NFS4_MOUNT_FSCACHE	0x4000	/* 1 */
 #define NFS4_MOUNT_FLAGMASK	0xFFFF
 
 #endif
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a616e60..886e62b 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -30,6 +30,7 @@ #include <linux/nfs_fs_sb.h>
 
 #include <linux/rwsem.h>
 #include <linux/mempool.h>
+#include <linux/fscache.h>
 
 /*
  * Enable debugging support for nfs client.
@@ -185,6 +186,9 @@ #ifdef CONFIG_NFS_V4
 	int			 delegation_state;
 	struct rw_semaphore	rwsem;
 #endif /* CONFIG_NFS_V4*/
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;
+#endif
 	struct inode		vfs_inode;
 };
 
@@ -577,6 +581,7 @@ #define NFSDBG_FILE		0x0040
 #define NFSDBG_ROOT		0x0080
 #define NFSDBG_CALLBACK		0x0100
 #define NFSDBG_CLIENT		0x0200
+#define NFSDBG_FSCACHE		0x0400
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 7ccfc7e..c44be53 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -3,6 +3,7 @@ #define _NFS_FS_SB
 
 #include <linux/list.h>
 #include <linux/backing-dev.h>
+#include <linux/fscache.h>
 
 struct nfs_iostats;
 
@@ -67,6 +68,10 @@ #ifdef CONFIG_NFS_V4
 	char			cl_ipaddr[16];
 	unsigned char		cl_id_uniquifier;
 #endif
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;	/* client index cache cookie */
+#endif
 };
 
 /*
diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
index 659c754..278bb4e 100644
--- a/include/linux/nfs_mount.h
+++ b/include/linux/nfs_mount.h
@@ -61,6 +61,7 @@ #define NFS_MOUNT_BROKEN_SUID	0x0400	/* 
 #define NFS_MOUNT_NOACL		0x0800	/* 4 */
 #define NFS_MOUNT_STRICTLOCK	0x1000	/* reserved for NFSv4 */
 #define NFS_MOUNT_SECFLAVOUR	0x2000	/* 5 */
+#define NFS_MOUNT_FSCACHE	0x4000
 #define NFS_MOUNT_FLAGMASK	0xFFFF
 
 #endif

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
                   ` (3 preceding siblings ...)
  2006-08-30 19:32 ` [PATCH 5/7] NFS: Use local caching " David Howells
@ 2006-08-30 19:32 ` David Howells
  2006-08-30 19:52 ` [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing " Andrew Morton
  5 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-08-30 19:32 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
---

 arch/ia64/kernel/ia64_ksyms.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index 3ead20f..6746a3e 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -42,6 +42,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing  [try #13]
  2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
                   ` (4 preceding siblings ...)
  2006-08-30 19:32 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
@ 2006-08-30 19:52 ` Andrew Morton
  2006-08-30 20:37   ` David Howells
  5 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2006-08-30 19:52 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, steved, trond.myklebust, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Wed, 30 Aug 2006 20:31:53 +0100
David Howells <dhowells@redhat.com> wrote:

> These patches add local caching for network filesystems such as NFS and AFS.

<fercrissake>

Not interested.  Please go learn quilt, send incremental patches.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-30 19:52 ` [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing " Andrew Morton
@ 2006-08-30 20:37   ` David Howells
  2006-08-30 20:55     ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-08-30 20:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, trond.myklebust, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:

> > These patches add local caching for network filesystems such as NFS and AFS.
> 
> <fercrissake>
> 
> Not interested.  Please go learn quilt, send incremental patches.

What's quilt able to do that StGIT can't?  AFAICT from quilt's manpage, it
can't mail incremental patches, so how does it help anyway?

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-30 20:37   ` David Howells
@ 2006-08-30 20:55     ` Andrew Morton
  2006-08-31  9:58       ` David Howells
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2006-08-30 20:55 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, steved, trond.myklebust, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Wed, 30 Aug 2006 21:37:18 +0100
David Howells <dhowells@redhat.com> wrote:

> Andrew Morton <akpm@osdl.org> wrote:
> 
> > > These patches add local caching for network filesystems such as NFS and AFS.
> > 
> > <fercrissake>
> > 
> > Not interested.  Please go learn quilt, send incremental patches.
> 
> What's quilt able to do that StGIT can't?  AFAICT from quilt's manpage, it
> can't mail incremental patches, so how does it help anyway?
> 

It was just a suggestion.  Please:

- test the patches which are presently in -mm.  I don't even know if they
  work, and we prefer to send Linus working stuff.

- Send fine-grained incremental patches.  It's OK to do complete
  replacement patchsets when the code is new, but this stuff is supposed to
  be stabilised.

  It took me quite a lot of time to extract the incremental patches out
  of try#12 and I don't want to do it again, plus it's just another step in
  which errors can be introduced.

Why incremental patches?

- So we can see what changed and don't have to re-review the whole thing

- So the recipient doesn't have to re-fix the same pile of rejects each time.

- So fixes which came in via other sources don't get lost.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-30 20:55     ` Andrew Morton
@ 2006-08-31  9:58       ` David Howells
  2006-08-31 17:21         ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-08-31  9:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, trond.myklebust, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:

> - Send fine-grained incremental patches.  It's OK to do complete
>   replacement patchsets when the code is new, but this stuff is supposed to
>   be stabilised.

I thought the code was still officially *new*.

As I understood things from what you said, you delegated responsibility for my
patches on to Trond, who hasn't taken them yet.  He has further delegated
review responsibility on to Christoph, so I've been consolidating my patches
to make it easier for Christoph (or whoever) to do so.

So, as I understand the situation, my patches won't go anywhere until
Christoph ACKs them and Trond takes them into his tree.  If this isn't so,
please clarify the situation.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-31  9:58       ` David Howells
@ 2006-08-31 17:21         ` Andrew Morton
  2006-08-31 17:26           ` Trond Myklebust
                             ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Andrew Morton @ 2006-08-31 17:21 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, steved, trond.myklebust, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Thu, 31 Aug 2006 10:58:30 +0100
David Howells <dhowells@redhat.com> wrote:

> Andrew Morton <akpm@osdl.org> wrote:
> 
> > - Send fine-grained incremental patches.  It's OK to do complete
> >   replacement patchsets when the code is new, but this stuff is supposed to
> >   be stabilised.
> 
> I thought the code was still officially *new*.

It's been floating around for ages; we want it to become *old*, showing a
decreasing rate of change.

> As I understood things from what you said, you delegated responsibility for my
> patches on to Trond, who hasn't taken them yet.

Trond merged the large nfs-affecting ones; I don't know if he intends to
handle the non-nfs bulk of the work though.

I doesn't matter, really - I'll frequently carry features with a plan to
send them into a subsystem tree.  Or Trond could duck it and I can send the
patches direct to Linus after git-nfs has merged.

Either way, the patches which are presently in -mm are "in the pipeline" -
they're the ones which people are testing (for compile, at least) and
reviewing (hah).  If we decide to send them into Trond then I'll add them
to my things-to-spam-maintainers-with pile.

Your CONFIG_BLOCK patches did a decent job of trashing your
fs-cache-make-kafs-* patches, btw.  What's up with that?  OK, it's sensible
for people to work against mainline but the net effect of doing that is to
create a mess for other people to clean up.

>  He has further delegated
> review responsibility on to Christoph, so I've been consolidating my patches
> to make it easier for Christoph (or whoever) to do so.

These patches are quite large and complex.  Frankly, I doubt if Trond or
Christoph have the bandwidth to review them.  It would be excellent if they
were able to, but...

We have a large coder-versus-reviewer imbalance, especially in the
filesystems area.  cf reiser4.

> So, as I understand the situation, my patches won't go anywhere until
> Christoph ACKs them and Trond takes them into his tree.  If this isn't so,
> please clarify the situation.
> 

If Christoph acks them then I can send them to Trond or Linus, at Trond's
option.

Or I can butt out, drop the patches, wait for them to turn up in Trond's
tree, at your option.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-31 17:21         ` Andrew Morton
@ 2006-08-31 17:26           ` Trond Myklebust
  2006-08-31 17:42           ` David Howells
  2006-09-01 13:08           ` David Howells
  2 siblings, 0 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-08-31 17:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Thu, 2006-08-31 at 10:21 -0700, Andrew Morton wrote:
> If Christoph acks them then I can send them to Trond or Linus, at Trond's
> option.
> 
> Or I can butt out, drop the patches, wait for them to turn up in Trond's
> tree, at your option.

I don't mind pulling them into my tree, but since Christoph had
objections to earlier implementations, and specifically asked me to put
a hold on the non-NFS related patches, then I'd first like an ACK from
him stating that he is now happy with the way those objections have been
handled in the updates.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-31 17:21         ` Andrew Morton
  2006-08-31 17:26           ` Trond Myklebust
@ 2006-08-31 17:42           ` David Howells
  2006-08-31 18:04             ` Andrew Morton
  2006-09-01 13:08           ` David Howells
  2 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-08-31 17:42 UTC (permalink / raw)
  To: Andrew Morton, trond.myklebust, hch
  Cc: David Howells, torvalds, steved, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:

> Trond merged the large nfs-affecting ones; I don't know if he intends to
> handle the non-nfs bulk of the work though.

There is one large NFS affecting patch left: namely the one that makes NFS use
FS-Cache.  I presume that requires Trond's agreement to merge.

> Your CONFIG_BLOCK patches did a decent job of trashing your
> fs-cache-make-kafs-* patches, btw.  What's up with that?  OK, it's sensible
> for people to work against mainline but the net effect of doing that is to
> create a mess for other people to clean up.

Hmmm...  Jens wanted my block patches against his tree; you wanted my NFS
patches against Trond's NFS tree.  I guess I should try stacking the whole
lot, but against what?  And who carries the fixes?  A patch to fix this
problem may well only apply to a tree that's the conjunction of both:-/

> If Christoph acks them then I can send them to Trond or Linus, at Trond's
> option.
> 
> Or I can butt out, drop the patches, wait for them to turn up in Trond's
> tree, at your option.

Trond, Christoph?  Any thoughts?


David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-31 17:42           ` David Howells
@ 2006-08-31 18:04             ` Andrew Morton
  0 siblings, 0 replies; 70+ messages in thread
From: Andrew Morton @ 2006-08-31 18:04 UTC (permalink / raw)
  To: David Howells
  Cc: trond.myklebust, hch, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Thu, 31 Aug 2006 18:42:08 +0100
David Howells <dhowells@redhat.com> wrote:

> > Your CONFIG_BLOCK patches did a decent job of trashing your
> > fs-cache-make-kafs-* patches, btw.  What's up with that?  OK, it's sensible
> > for people to work against mainline but the net effect of doing that is to
> > create a mess for other people to clean up.
> 
> Hmmm...  Jens wanted my block patches against his tree; you wanted my NFS
> patches against Trond's NFS tree.  I guess I should try stacking the whole
> lot, but against what?  And who carries the fixes?  A patch to fix this
> problem may well only apply to a tree that's the conjunction of both:-/

There is no easy solution, particularly with a patch like that one which
splatters itself all over the place.

The best time to do such things is against 2.6.x-rc1, when everyone is
maximally-merged-up.  The worst time is when we're at 2.6.x-rc5, when
everyone is maximally-unmerged-up.

If we're at -rc5 and one doesn't want to wait for a few weeks then one can
work against the -mm lineup, because then when we hit -rc1 and the
subsystems are merged up, the proposed patch will slot in nicely with
minimal breakage: no queue-jumping.

The exception to that rule is patches which move files around.  Because
even a single-line change in one of the affected files will cause the
move-things-around patch to break, and to need somewhat risky rework.  In
that case, simply waiting until -rc1 is the best approach

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-08-31 17:21         ` Andrew Morton
  2006-08-31 17:26           ` Trond Myklebust
  2006-08-31 17:42           ` David Howells
@ 2006-09-01 13:08           ` David Howells
  2006-09-01 16:34             ` Andrew Morton
  2 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-01 13:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, trond.myklebust, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:

> Your CONFIG_BLOCK patches did a decent job of trashing your
> fs-cache-make-kafs-* patches, btw.  What's up with that?  OK, it's sensible
> for people to work against mainline but the net effect of doing that is to
> create a mess for other people to clean up.

It seems the only problem in my patches is that the file address space
operations have had the sync_pages op removed in a patch in the
disable-block-layer patchset as it's no longer necessary.

However, as I suspect you're applying the block patches *before* the FS-Cache
patches, I can't give you an incremental patch that you can apply after the
other fs-cache-make-kafs-* patches, since you need to modify the first patch
(fs-cache-make-kafs-use-fs-cache.patch) to get it to apply at all now.

So, I could issue a revised AFS+FS-Cache patch, would that do?  Or would you
rather have a patch that you can apply to the one you already have directly
and modify it in place?

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-01 13:08           ` David Howells
@ 2006-09-01 16:34             ` Andrew Morton
  2006-09-01 17:00               ` Trond Myklebust
  2006-09-04 18:20               ` David Howells
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2006-09-01 16:34 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, steved, trond.myklebust, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Fri, 01 Sep 2006 14:08:34 +0100
David Howells <dhowells@redhat.com> wrote:

> Andrew Morton <akpm@osdl.org> wrote:
> 
> > Your CONFIG_BLOCK patches did a decent job of trashing your
> > fs-cache-make-kafs-* patches, btw.  What's up with that?  OK, it's sensible
> > for people to work against mainline but the net effect of doing that is to
> > create a mess for other people to clean up.
> 
> It seems the only problem in my patches is that the file address space
> operations have had the sync_pages op removed in a patch in the
> disable-block-layer patchset as it's no longer necessary.
> 
> However, as I suspect you're applying the block patches *before* the FS-Cache
> patches,

Yes, I stage the subsystem trees ahead of everything else.  So a) things
which get merged into a subsystem tree effectively do a queue-jump and b) I
spend much of the merge window twiddling thumbs until the git trees have
merged.

> I can't give you an incremental patch that you can apply after the
> other fs-cache-make-kafs-* patches, since you need to modify the first patch
> (fs-cache-make-kafs-use-fs-cache.patch) to get it to apply at all now.
> 
> So, I could issue a revised AFS+FS-Cache patch, would that do?  Or would you
> rather have a patch that you can apply to the one you already have directly
> and modify it in place?

I fixed it all up, I think.  Please review-and-test rc5-mm1 (including
hot-fixes/ contents, which grows apace).

nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-01 16:34             ` Andrew Morton
@ 2006-09-01 17:00               ` Trond Myklebust
  2006-09-02  2:50                 ` Andrew Morton
  2006-09-04 18:20               ` David Howells
  1 sibling, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-01 17:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel

On Fri, 2006-09-01 at 09:34 -0700, Andrew Morton wrote:

> nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?

You mean autofs indirect maps?

I'll see if I can't get my hands on an selinux setup like yours in order
to do some debugging. AFAICS, the non-selinux case works fine, though.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-01 17:00               ` Trond Myklebust
@ 2006-09-02  2:50                 ` Andrew Morton
  2006-09-02  4:11                   ` Ian Kent
                                     ` (3 more replies)
  0 siblings, 4 replies; 70+ messages in thread
From: Andrew Morton @ 2006-09-02  2:50 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, torvalds, steved, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel, Ian Kent

On Fri, 01 Sep 2006 13:00:44 -0400
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> On Fri, 2006-09-01 at 09:34 -0700, Andrew Morton wrote:
> 
> > nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?
> 
> You mean autofs indirect maps?

I don't know that that is.

> I'll see if I can't get my hands on an selinux setup like yours in order
> to do some debugging. AFAICS, the non-selinux case works fine, though.

It doesn't appear to be related to selinux.

On a stock, mostly-up-to-date FC5 installation:

	echo 0 > /selinux/enforce
	service autofs stop
	service nfs stop
	service nfs start
	service autofs start


sony:/home/akpm> ls -l /net/bix/usr/src
total 0

sony:/home/akpm> showmount -e bix
Export list for bix:
/           *
/usr/src    *
/mnt/export *


The automounter will mount bix:/ on /net/bix.  But I am unable to get it to
mount bix's /usr/src on /net/bix/usr/src.

On bix we have

bix:/home/akpm> mount
/dev/sda2 on / type ext3 (rw,noatime)
/dev/sdb1 on /usr/src type ext3 (rw,noatime)
...


Without git-nfs applied, /net/bix/usr/src mounts as expected.

iirc, we decided this is related to the fs-cache infrastructure work which
went into git-nfs.  I think David can reproduce this?

-- 
VGER BF report: H 1.91513e-14

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  2:50                 ` Andrew Morton
@ 2006-09-02  4:11                   ` Ian Kent
  2006-09-02  5:58                     ` Andrew Morton
  2006-09-02  4:49                   ` Ian Kent
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-02  4:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Fri, 2006-09-01 at 19:50 -0700, Andrew Morton wrote:
> On Fri, 01 Sep 2006 13:00:44 -0400
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
> > On Fri, 2006-09-01 at 09:34 -0700, Andrew Morton wrote:
> > 
> > > nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?
> > 
> > You mean autofs indirect maps?
> 
> I don't know that that is.
> 
> > I'll see if I can't get my hands on an selinux setup like yours in order
> > to do some debugging. AFAICS, the non-selinux case works fine, though.
> 
> It doesn't appear to be related to selinux.
> 
> On a stock, mostly-up-to-date FC5 installation:
> 
> 	echo 0 > /selinux/enforce
> 	service autofs stop
> 	service nfs stop
> 	service nfs start
> 	service autofs start
> 
> 
> sony:/home/akpm> ls -l /net/bix/usr/src
> total 0
> 
> sony:/home/akpm> showmount -e bix
> Export list for bix:
> /           *
> /usr/src    *
> /mnt/export *
> 
> 
> The automounter will mount bix:/ on /net/bix.  But I am unable to get it to
> mount bix's /usr/src on /net/bix/usr/src.

Is it the same symptom as before or is it that bix:/usr/src is not also
being mounted?

> Without git-nfs applied, /net/bix/usr/src mounts as expected.
> 
> iirc, we decided this is related to the fs-cache infrastructure work which
> went into git-nfs.  I think David can reproduce this?

I'll build the latest mm kernel and try to reproduce it.
>From memory I couldn't reproduce it last time I tried.
Is there anything I need to add to rc5-mm1 for this?

Ian



-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  2:50                 ` Andrew Morton
  2006-09-02  4:11                   ` Ian Kent
@ 2006-09-02  4:49                   ` Ian Kent
  2006-09-04 11:52                   ` David Howells
  2006-09-04 11:52                   ` David Howells
  3 siblings, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-02  4:49 UTC (permalink / raw)
  To: Andrew Morton, Trond Myklebust
  Cc: David Howells, Linus Torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel


On Fri, 1 Sep 2006 19:50:09 -0700, "Andrew Morton" <akpm@osdl.org> said:
> On Fri, 01 Sep 2006 13:00:44 -0400
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
> > On Fri, 2006-09-01 at 09:34 -0700, Andrew Morton wrote:
> > 
> > > nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?
> > 
> > You mean autofs indirect maps?
> 
> I don't know that that is.
> 

The mount that Andrew is a "host" type mount.
autofs gets the host name as a key and is expected to mount all
filesystems exported from the host. It does this by attempting to
mounting each export in shortest to longest order (to take account
of nesting of the mounts).

Ian
Ian


-- 
VGER BF report: H 3.10862e-15

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  4:11                   ` Ian Kent
@ 2006-09-02  5:58                     ` Andrew Morton
  2006-09-03  6:21                       ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2006-09-02  5:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sat, 02 Sep 2006 12:11:12 +0800
Ian Kent <raven@themaw.net> wrote:

> On Fri, 2006-09-01 at 19:50 -0700, Andrew Morton wrote:
> > On Fri, 01 Sep 2006 13:00:44 -0400
> > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > 
> > > On Fri, 2006-09-01 at 09:34 -0700, Andrew Morton wrote:
> > > 
> > > > nfs automounter submounts are still broken in Trond's tree, btw.  Are we stuck?
> > > 
> > > You mean autofs indirect maps?
> > 
> > I don't know that that is.
> > 
> > > I'll see if I can't get my hands on an selinux setup like yours in order
> > > to do some debugging. AFAICS, the non-selinux case works fine, though.
> > 
> > It doesn't appear to be related to selinux.
> > 
> > On a stock, mostly-up-to-date FC5 installation:
> > 
> > 	echo 0 > /selinux/enforce
> > 	service autofs stop
> > 	service nfs stop
> > 	service nfs start
> > 	service autofs start
> > 
> > 
> > sony:/home/akpm> ls -l /net/bix/usr/src
> > total 0
> > 
> > sony:/home/akpm> showmount -e bix
> > Export list for bix:
> > /           *
> > /usr/src    *
> > /mnt/export *
> > 
> > 
> > The automounter will mount bix:/ on /net/bix.  But I am unable to get it to
> > mount bix's /usr/src on /net/bix/usr/src.
> 
> Is it the same symptom as before or is it that bix:/usr/src is not also
> being mounted?

When this saga first started an `ls -l /net/bix' showed a corrupted dentry
for /net/bix/usr.  It was determined that this was SELinux-related.  Fixes were
made and that no longer occurs.

Now, treading on /net/bix/usr/src does not cause bix:/usr/src to be mounted
at /net/bix/usr/src.  Without git-nfs that mount does occur.

The present behaviour is unchanged if /selinux/enforce is set to 0.

> > Without git-nfs applied, /net/bix/usr/src mounts as expected.
> > 
> > iirc, we decided this is related to the fs-cache infrastructure work which
> > went into git-nfs.  I think David can reproduce this?
> 
> I'll build the latest mm kernel and try to reproduce it.
> >From memory I couldn't reproduce it last time I tried.
> Is there anything I need to add to rc5-mm1 for this?

Nope.

-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  5:58                     ` Andrew Morton
@ 2006-09-03  6:21                       ` Ian Kent
  2006-09-03  6:30                         ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-03  6:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Fri, 2006-09-01 at 22:58 -0700, Andrew Morton wrote:
> > > 
> > > It doesn't appear to be related to selinux.

I have a festering suspicion, but no evidence yet, that this is not
always the case.

> > > 
> > > On a stock, mostly-up-to-date FC5 installation:
> > > 
> > > 	echo 0 > /selinux/enforce
> > > 	service autofs stop
> > > 	service nfs stop
> > > 	service nfs start
> > > 	service autofs start

I'm now setup my little system the same.

[root@raven selinux]# uname -a
Linux raven.themaw.net 2.6.18-rc5-mm1 #1 SMP Sat Sep 2 23:11:01 WST 2006
x86_64 x86_64 x86_64 GNU/Linux

[root@raven selinux]# rpm -q autofs
autofs-4.1.4-29

[root@raven selinux]# getenforce
Permissive

[root@raven selinux]# rpm -q selinux-policy
selinux-policy-2.3.7-2.fc5

> > > 
> > > 
> > > sony:/home/akpm> ls -l /net/bix/usr/src
> > > total 0
> > > 
> > > sony:/home/akpm> showmount -e bix
> > > Export list for bix:
> > > /           *
> > > /usr/src    *
> > > /mnt/export *

Almost the same.

[root@raven selinux]# showmount -e budgie
Export list for budgie:
/        *
/usr/src *

> > > 
> > > 
> > > The automounter will mount bix:/ on /net/bix.  But I am unable to get it to
> > > mount bix's /usr/src on /net/bix/usr/src.
> > 
> > Is it the same symptom as before or is it that bix:/usr/src is not also
> > being mounted?

[root@raven selinux]# lsmod|grep autofs
autofs4                40776  1

I guess you haven't got the autofs module loaded instead of autofs4 by
mistake.

[raven@raven ~]$ mount
/dev/hda5 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/hda6 on /home type ext3 (rw)
/dev/hda7 on /work type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
automount(pid3463) on /net type autofs
(rw,fd=5,pgrp=3463,minproto=2,maxproto=4)

[raven@raven ~]$ ls /net/budgie
autofs  cdrom  export71  initrd          lib         opt   sbin  usr
vmlinuz.old
bin     dev    floppy    initrd.img      lost+found  proc  sys   var
boot    etc    home      initrd.img.old  mnt         root  tmp   vmlinuz
[raven@raven ~]$ mount
/dev/hda5 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/hda6 on /home type ext3 (rw)
/dev/hda7 on /work type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
automount(pid3463) on /net type autofs
(rw,fd=5,pgrp=3463,minproto=2,maxproto=4)
budgie:/ on /net/budgie type nfs
(rw,nosuid,nodev,hard,intr,addr=10.49.97.33)
budgie:/usr/src on /net/budgie/usr/src type nfs
(rw,nosuid,nodev,hard,intr,addr=10.49.97.33)

So I wonder what the different is between the setups?

> 
> When this saga first started an `ls -l /net/bix' showed a corrupted dentry
> for /net/bix/usr.  It was determined that this was SELinux-related.  Fixes were
> made and that no longer occurs.
> 
> Now, treading on /net/bix/usr/src does not cause bix:/usr/src to be mounted
> at /net/bix/usr/src.  Without git-nfs that mount does occur.
> 
> The present behaviour is unchanged if /selinux/enforce is set to 0.
> 
> > > Without git-nfs applied, /net/bix/usr/src mounts as expected.
> > > 
> > > iirc, we decided this is related to the fs-cache infrastructure work which
> > > went into git-nfs.  I think David can reproduce this?

Can you reproduce this David?

> > 
> > I'll build the latest mm kernel and try to reproduce it.
> > >From memory I couldn't reproduce it last time I tried.
> > Is there anything I need to add to rc5-mm1 for this?
> 
> Nope.

I'm stumped.

Ian



-- 
VGER BF report: H 0.0277086

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-03  6:21                       ` Ian Kent
@ 2006-09-03  6:30                         ` Andrew Morton
  2006-09-03  6:43                           ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2006-09-03  6:30 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sun, 03 Sep 2006 14:21:30 +0800
Ian Kent <raven@themaw.net> wrote:

> I guess you haven't got the autofs module loaded instead of autofs4 by
> mistake.

Nope.

> So I wonder what the different is between the setups?

Beats me.  Maybe cook up a debug patch?



-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-03  6:30                         ` Andrew Morton
@ 2006-09-03  6:43                           ` Ian Kent
  2006-09-03 16:58                             ` Andrew Morton
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-03  6:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sat, 2006-09-02 at 23:30 -0700, Andrew Morton wrote:
> On Sun, 03 Sep 2006 14:21:30 +0800
> Ian Kent <raven@themaw.net> wrote:
> 
> > I guess you haven't got the autofs module loaded instead of autofs4 by
> > mistake.
> 
> Nope.
> 
> > So I wonder what the different is between the setups?
> 
> Beats me.  Maybe cook up a debug patch?

OK.

Could you add "--debug" to DAEMONOPTIONS in /etc/sysconfig/autofs and
post the output so I can get some idea where to put the prints please.

Ian



-- 
VGER BF report: H 7.45677e-07

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-03  6:43                           ` Ian Kent
@ 2006-09-03 16:58                             ` Andrew Morton
  2006-09-04  2:23                               ` Ian Kent
  2006-09-04  5:40                               ` Ian Kent
  0 siblings, 2 replies; 70+ messages in thread
From: Andrew Morton @ 2006-09-03 16:58 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sun, 03 Sep 2006 14:43:00 +0800
Ian Kent <raven@themaw.net> wrote:

> On Sat, 2006-09-02 at 23:30 -0700, Andrew Morton wrote:
> > On Sun, 03 Sep 2006 14:21:30 +0800
> > Ian Kent <raven@themaw.net> wrote:
> > 
> > > I guess you haven't got the autofs module loaded instead of autofs4 by
> > > mistake.
> > 
> > Nope.
> > 
> > > So I wonder what the different is between the setups?
> > 
> > Beats me.  Maybe cook up a debug patch?
> 
> OK.
> 
> Could you add "--debug" to DAEMONOPTIONS in /etc/sysconfig/autofs and
> post the output so I can get some idea where to put the prints please.
> 

Sep  3 09:56:40 sony automount[18446]: starting automounter version 4.1.4-29, path = /net, maptype = program, mapname = /etc/auto.net
Sep  3 09:56:40 sony kernel: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts
Sep  3 09:56:40 sony automount[18446]: using kernel protocol version 4.00
Sep  3 09:56:40 sony automount[18446]: using timeout 60 seconds; freq 15 secs
Sep  3 09:56:53 sony automount[18446]: attempting to mount entry /net/bix
Sep  3 09:56:53 sony kernel: SELinux: initialized (dev 0:16, type nfs), uses genfs_contexts
Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/usr/src failed: Permission denied
Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/mnt/export failed: Permission denied

-- 
VGER BF report: H 0.0383034

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-03 16:58                             ` Andrew Morton
@ 2006-09-04  2:23                               ` Ian Kent
  2006-09-04  5:40                               ` Ian Kent
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-04  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sun, 2006-09-03 at 09:58 -0700, Andrew Morton wrote:
> On Sun, 03 Sep 2006 14:43:00 +0800
> Ian Kent <raven@themaw.net> wrote:
> 
> > On Sat, 2006-09-02 at 23:30 -0700, Andrew Morton wrote:
> > > On Sun, 03 Sep 2006 14:21:30 +0800
> > > Ian Kent <raven@themaw.net> wrote:
> > > 
> > > > I guess you haven't got the autofs module loaded instead of autofs4 by
> > > > mistake.
> > > 
> > > Nope.
> > > 
> > > > So I wonder what the different is between the setups?
> > > 
> > > Beats me.  Maybe cook up a debug patch?
> > 
> > OK.
> > 
> > Could you add "--debug" to DAEMONOPTIONS in /etc/sysconfig/autofs and
> > post the output so I can get some idea where to put the prints please.
> > 
> 
> Sep  3 09:56:40 sony automount[18446]: starting automounter version 4.1.4-29, path = /net, maptype = program, mapname = /etc/auto.net
> Sep  3 09:56:40 sony kernel: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts
> Sep  3 09:56:40 sony automount[18446]: using kernel protocol version 4.00
> Sep  3 09:56:40 sony automount[18446]: using timeout 60 seconds; freq 15 secs
> Sep  3 09:56:53 sony automount[18446]: attempting to mount entry /net/bix
> Sep  3 09:56:53 sony kernel: SELinux: initialized (dev 0:16, type nfs), uses genfs_contexts
> Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/usr/src failed: Permission denied
> Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/mnt/export failed: Permission denied

Yes and these should be EXIST.

Could you humor me a little further and load the base selinux policy
that enables rules that "dontaudit" access fails using:

semodule -b /usr/share/selinux/targeted/enableaudit.pp

and see if we get any avc messages.

Ian




-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-03 16:58                             ` Andrew Morton
  2006-09-04  2:23                               ` Ian Kent
@ 2006-09-04  5:40                               ` Ian Kent
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-04  5:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Sun, 2006-09-03 at 09:58 -0700, Andrew Morton wrote:
> On Sun, 03 Sep 2006 14:43:00 +0800
> Ian Kent <raven@themaw.net> wrote:
> 
> > On Sat, 2006-09-02 at 23:30 -0700, Andrew Morton wrote:
> > > On Sun, 03 Sep 2006 14:21:30 +0800
> > > Ian Kent <raven@themaw.net> wrote:
> > > 
> > > > I guess you haven't got the autofs module loaded instead of autofs4 by
> > > > mistake.
> > > 
> > > Nope.
> > > 
> > > > So I wonder what the different is between the setups?
> > > 
> > > Beats me.  Maybe cook up a debug patch?
> > 
> > OK.
> > 
> > Could you add "--debug" to DAEMONOPTIONS in /etc/sysconfig/autofs and
> > post the output so I can get some idea where to put the prints please.
> > 
> 
> Sep  3 09:56:40 sony automount[18446]: starting automounter version 4.1.4-29, path = /net, maptype = program, mapname = /etc/auto.net
> Sep  3 09:56:40 sony kernel: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts
> Sep  3 09:56:40 sony automount[18446]: using kernel protocol version 4.00
> Sep  3 09:56:40 sony automount[18446]: using timeout 60 seconds; freq 15 secs
> Sep  3 09:56:53 sony automount[18446]: attempting to mount entry /net/bix
> Sep  3 09:56:53 sony kernel: SELinux: initialized (dev 0:16, type nfs), uses genfs_contexts
> Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/usr/src failed: Permission denied
> Sep  3 09:56:53 sony automount[18453]: mount(nfs): mkdir_path /net/bix/mnt/export failed: Permission denied

I'm able to duplicate this now.
My mistake, my exports had the no_root_squash attribute set.
Without it I get the permission denied and no mount.

Looking through the git-nfs patch I can't see how this would have
changed, returning EPERM, perhaps before it gets to the EEXIST check.

Anyone else have any idea why this may has changed.

Ian



-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  2:50                 ` Andrew Morton
  2006-09-02  4:11                   ` Ian Kent
  2006-09-02  4:49                   ` Ian Kent
@ 2006-09-04 11:52                   ` David Howells
  2006-09-04 11:52                   ` David Howells
  3 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-09-04 11:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel, Ian Kent

Andrew Morton <akpm@osdl.org> wrote:

> The automounter will mount bix:/ on /net/bix.  But I am unable to get it to
> mount bix's /usr/src on /net/bix/usr/src.

>From what I can tell, the problem is that the automounter causes a dentry to
be created on the xdev transition point, but the dentry is not set up right to
do in-NFS automounting on the follow_link() op.

By "xdev transition point" I mean a directory exported from the server that
has a different FSID to its parent.  The NFS client detects that and provides
automounting facilities in the following manner:

 (1) A directory dentry is set up in the superblock of the parent FSID to
     represent the transition point.  This dentry has a follow_link() op set.

 (2) When someone tries to traverse the transition point, the follow_link() op
     is invoked.  This causes a new superblock to be created to represent the
     new FSID, and a root directory entry is allocated there.  This new root
     is then mounted over the dentry set up in (1).

 (3) The follow_link() op of the transit dentry returns the mountpoint dentry
     as if a symlink had been transited.  Further transits just see the
     mountpoint and ignore the transit dentry at the bottom of the pile.

Note that an lstat() of the transit dentry does not cause automounting to take
place because lstat() does not follow terminal symlinks, and thus does not
invoke the follow_link() op.


However, when the automounter preemptively creates a dentry there with mkdir,
it can install a directory dentry *without* the appropriate follow_link() op.

David

-- 
VGER BF report: H 0.00103342

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-02  2:50                 ` Andrew Morton
                                     ` (2 preceding siblings ...)
  2006-09-04 11:52                   ` David Howells
@ 2006-09-04 11:52                   ` David Howells
  2006-09-04 13:24                     ` Ian Kent
  2006-09-05  2:23                     ` Trond Myklebust
  3 siblings, 2 replies; 70+ messages in thread
From: David Howells @ 2006-09-04 11:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Trond Myklebust, David Howells, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel, Ian Kent

Andrew Morton <akpm@osdl.org> wrote:

> sony:/home/akpm> ls -l /net/bix/usr/src
> total 0
> 
> sony:/home/akpm> showmount -e bix
> Export list for bix:
> /           *
> /usr/src    *
> /mnt/export *

Yes, but what's your /etc/exports now?  Not all options appear to showmount.

Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
/ line if you don't currently have them and try again?

> iirc, we decided this is related to the fs-cache infrastructure work which
> went into git-nfs.  I think David can reproduce this?

I'd only reproduced it with SELinux in enforcing mode.

Under such conditions, unless there's a readdir on the root directory, the
subdirs under which exports exist will remain as incorrectly negative
dentries.

The problem is a conjunction of circumstances:

 (1) nfs_lookup() has a shortcut in it that skips contact with the server if
     we're doing a lookup with intent to create.  This leaves an incorrectly
     negative dentry if there _is_ actually an object on the server.

 (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
     by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
     by which the abort can occur.

 (3) One of my patches correctly assigns the security label to the automounted
     root dentry.

 (4) SELinux then aborts the automounter's mkdir() call because the automounter
     does _not_ carry the correct security label to write to the NFS directory.

 (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
     is not invoked to set it right.

The only bit I added was (3), but that's not the only circumstance in which
this can occur.


If, for example, I do "chmod a-w /" on the NFS server, I can see the same
effects on the client without the need for SELinux to put its foot in the door.
Automount does:

[pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
[pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
[pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
[pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)

And where I was listing the disputed directory, I see:

	[root@andromeda ~]# ls -lad /net/trash/usr/src
	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
	[root@andromeda ~]#

which isn't what I'd expect.  What I'd expect is:

	[root@andromeda ~]# ls -l /net/trash/usr/src
	total 15
	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
	[root@andromeda ~]#

David

-- 
VGER BF report: U 0.499013

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 11:52                   ` David Howells
@ 2006-09-04 13:24                     ` Ian Kent
  2006-09-04 13:46                       ` David Howells
  2006-09-05  1:57                       ` Trond Myklebust
  2006-09-05  2:23                     ` Trond Myklebust
  1 sibling, 2 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-04 13:24 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Trond Myklebust, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 12:52 +0100, David Howells wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> 
> > sony:/home/akpm> ls -l /net/bix/usr/src
> > total 0
> > 
> > sony:/home/akpm> showmount -e bix
> > Export list for bix:
> > /           *
> > /usr/src    *
> > /mnt/export *
> 
> Yes, but what's your /etc/exports now?  Not all options appear to showmount.
> 
> Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
> / line if you don't currently have them and try again?
> 
> > iirc, we decided this is related to the fs-cache infrastructure work which
> > went into git-nfs.  I think David can reproduce this?
> 
> I'd only reproduced it with SELinux in enforcing mode.
> 
> Under such conditions, unless there's a readdir on the root directory, the
> subdirs under which exports exist will remain as incorrectly negative
> dentries.
> 
> The problem is a conjunction of circumstances:
> 
>  (1) nfs_lookup() has a shortcut in it that skips contact with the server if
>      we're doing a lookup with intent to create.  This leaves an incorrectly
>      negative dentry if there _is_ actually an object on the server.
> 
>  (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
>      by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
>      by which the abort can occur.
> 
>  (3) One of my patches correctly assigns the security label to the automounted
>      root dentry.
> 
>  (4) SELinux then aborts the automounter's mkdir() call because the automounter
>      does _not_ carry the correct security label to write to the NFS directory.
> 
>  (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
>      is not invoked to set it right.
> 
> The only bit I added was (3), but that's not the only circumstance in which
> this can occur.
> 
> 
> If, for example, I do "chmod a-w /" on the NFS server, I can see the same
> effects on the client without the need for SELinux to put its foot in the door.
> Automount does:
> 
> [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)

This is the point I'm trying to make.
I'm able to reproduce this with exports that don't have "nohide".
The mkdir used to return EEXIST, possibly before getting to the EACCES
test. It appears to be a change in semantic behavior and I can't see
where it is coming from. autofs expects an EEXIST but not an EACCES and
so doesn't perform the mount. I could ignore the EACCES but that would
be cheating.

> And where I was listing the disputed directory, I see:
> 
> 	[root@andromeda ~]# ls -lad /net/trash/usr/src
> 	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
> 	[root@andromeda ~]#
> 
> which isn't what I'd expect.  What I'd expect is:
> 
> 	[root@andromeda ~]# ls -l /net/trash/usr/src
> 	total 15
> 	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
> 	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
> 	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
> 	[root@andromeda ~]#
> 
> David


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 13:24                     ` Ian Kent
@ 2006-09-04 13:46                       ` David Howells
  2006-09-04 15:00                         ` Ian Kent
  2006-09-05  4:11                         ` Ian Kent
  2006-09-05  1:57                       ` Trond Myklebust
  1 sibling, 2 replies; 70+ messages in thread
From: David Howells @ 2006-09-04 13:46 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, Trond Myklebust, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> This is the point I'm trying to make.
> I'm able to reproduce this with exports that don't have "nohide".
> The mkdir used to return EEXIST, possibly before getting to the EACCES
> test. It appears to be a change in semantic behavior and I can't see
> where it is coming from. autofs expects an EEXIST but not an EACCES and
> so doesn't perform the mount. I could ignore the EACCES but that would
> be cheating.

Here's something you can try:  Look in fs/nfs/dir.c.  Find nfs_lookup().  In
there, find the following lines:

	/* If we're doing an exclusive create, optimize away the lookup */
	if (nfs_is_exclusive_create(dir, nd))
		goto no_entry;

Comment that bit out and see what the effect it.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 13:46                       ` David Howells
@ 2006-09-04 15:00                         ` Ian Kent
  2006-09-05  4:11                         ` Ian Kent
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-04 15:00 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Trond Myklebust, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 14:46 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > This is the point I'm trying to make.
> > I'm able to reproduce this with exports that don't have "nohide".
> > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > test. It appears to be a change in semantic behavior and I can't see
> > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > so doesn't perform the mount. I could ignore the EACCES but that would
> > be cheating.
> 
> Here's something you can try:  Look in fs/nfs/dir.c.  Find nfs_lookup().  In
> there, find the following lines:
> 
> 	/* If we're doing an exclusive create, optimize away the lookup */
> 	if (nfs_is_exclusive_create(dir, nd))
> 		goto no_entry;
> 
> Comment that bit out and see what the effect it.

OK. But tomorrow.
I'll let you know.

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-01 16:34             ` Andrew Morton
  2006-09-01 17:00               ` Trond Myklebust
@ 2006-09-04 18:20               ` David Howells
  1 sibling, 0 replies; 70+ messages in thread
From: David Howells @ 2006-09-04 18:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, steved, trond.myklebust, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel


Andrew Morton <akpm@osdl.org> wrote:

> I fixed it all up, I think.  Please review-and-test rc5-mm1

It seems to work okay, and the you seem to have made a change to fs/afs/file.c
that matches the one that I've made, so thanks.

> (including hot-fixes/ contents, which grows apace).

 (*) drivers-md-kconfig-fix-block-dependency.patch

     I ACK'd Adrian's patch, and the change it makes appears in Jens's block
     GIT tree, even if the patch itself does not AFAICT.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 13:24                     ` Ian Kent
  2006-09-04 13:46                       ` David Howells
@ 2006-09-05  1:57                       ` Trond Myklebust
  2006-09-05  2:55                         ` Ian Kent
  2006-09-05  9:57                         ` David Howells
  1 sibling, 2 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  1:57 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 21:24 +0800, Ian Kent wrote:

> > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> 
> This is the point I'm trying to make.
> I'm able to reproduce this with exports that don't have "nohide".
> The mkdir used to return EEXIST, possibly before getting to the EACCES
> test. It appears to be a change in semantic behavior and I can't see
> where it is coming from. autofs expects an EEXIST but not an EACCES and
> so doesn't perform the mount. I could ignore the EACCES but that would
> be cheating.

Why the hell is it doing a mkdir in the first place? ...and why the hell
is it not able to cope with EACCES? The latter is hardly an unlikely
reply: it means that the automounter should not be doing this in the
first place, 'cos it doesn't have the privileges. That is not the same
as saying that it doesn't have the privileges to do a lookup.

Trond


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 11:52                   ` David Howells
  2006-09-04 13:24                     ` Ian Kent
@ 2006-09-05  2:23                     ` Trond Myklebust
  2006-09-05  3:01                       ` Ian Kent
  2006-09-05  4:06                       ` Ian Kent
  1 sibling, 2 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  2:23 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, torvalds, steved, linux-fsdevel, linux-cachefs,
	nfsv4, linux-kernel, Ian Kent

[-- Attachment #1: Type: text/plain, Size: 3389 bytes --]

On Mon, 2006-09-04 at 12:52 +0100, David Howells wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> 
> > sony:/home/akpm> ls -l /net/bix/usr/src
> > total 0
> > 
> > sony:/home/akpm> showmount -e bix
> > Export list for bix:
> > /           *
> > /usr/src    *
> > /mnt/export *
> 
> Yes, but what's your /etc/exports now?  Not all options appear to showmount.
> 
> Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
> / line if you don't currently have them and try again?
> 
> > iirc, we decided this is related to the fs-cache infrastructure work which
> > went into git-nfs.  I think David can reproduce this?
> 
> I'd only reproduced it with SELinux in enforcing mode.
> 
> Under such conditions, unless there's a readdir on the root directory, the
> subdirs under which exports exist will remain as incorrectly negative
> dentries.
> 
> The problem is a conjunction of circumstances:
> 
>  (1) nfs_lookup() has a shortcut in it that skips contact with the server if
>      we're doing a lookup with intent to create.  This leaves an incorrectly
>      negative dentry if there _is_ actually an object on the server.
> 
>  (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
>      by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
>      by which the abort can occur.
> 
>  (3) One of my patches correctly assigns the security label to the automounted
>      root dentry.
> 
>  (4) SELinux then aborts the automounter's mkdir() call because the automounter
>      does _not_ carry the correct security label to write to the NFS directory.
> 
>  (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
>      is not invoked to set it right.
> 
> The only bit I added was (3), but that's not the only circumstance in which
> this can occur.
> 
> 
> If, for example, I do "chmod a-w /" on the NFS server, I can see the same
> effects on the client without the need for SELinux to put its foot in the door.
> Automount does:
> 
> [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> 
> And where I was listing the disputed directory, I see:
> 
> 	[root@andromeda ~]# ls -lad /net/trash/usr/src
> 	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
> 	[root@andromeda ~]#
> 
> which isn't what I'd expect.  What I'd expect is:
> 
> 	[root@andromeda ~]# ls -l /net/trash/usr/src
> 	total 15
> 	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
> 	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
> 	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
> 	[root@andromeda ~]#

One way to fix this is to simply not hash the dentry when we're doing
the O_EXCL intent optimisation, but rather to only hash it _after_ we've
successfully created the file on the server. Something like the attached
patch ought to do it.

Note, though, that this will not fix the autofs problem: autofs is
trying to perform a totally unnecessary mkdir(), and is giving up when
it is told that SELinux won't authorise that particular operation. This
is clearly an autofs bug...

Cheers,
  Trond

[-- Attachment #2: linux-2.6.18-063-fix_exclusive_create.dif --]
[-- Type: message/rfc822, Size: 1221 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: 
Message-ID: <1157422828.5510.19.camel@localhost>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/dir.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 51328ae..e83a2ff 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -904,9 +904,14 @@ static struct dentry *nfs_lookup(struct 
 
 	lock_kernel();
 
-	/* If we're doing an exclusive create, optimize away the lookup */
-	if (nfs_is_exclusive_create(dir, nd))
-		goto no_entry;
+	/*
+	 * If we're doing an exclusive create, optimize away the lookup
+	 * but don't hash the dentry.
+	 */
+	if (nfs_is_exclusive_create(dir, nd)) {
+		d_instantiate(dentry, NULL);
+		goto out_unlock;
+	}
 
 	error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr);
 	if (error == -ENOENT)
@@ -1161,6 +1166,8 @@ int nfs_instantiate(struct dentry *dentr
 	if (IS_ERR(inode))
 		return error;
 	d_instantiate(dentry, inode);
+	if (d_unhashed(dentry))
+		d_rehash(dentry);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  1:57                       ` Trond Myklebust
@ 2006-09-05  2:55                         ` Ian Kent
  2006-09-05  3:50                           ` Trond Myklebust
  2006-09-05  9:48                           ` David Howells
  2006-09-05  9:57                         ` David Howells
  1 sibling, 2 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-05  2:55 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 21:57 -0400, Trond Myklebust wrote:
> On Mon, 2006-09-04 at 21:24 +0800, Ian Kent wrote:
> 
> > > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > 
> > This is the point I'm trying to make.
> > I'm able to reproduce this with exports that don't have "nohide".
> > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > test. It appears to be a change in semantic behavior and I can't see
> > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > so doesn't perform the mount. I could ignore the EACCES but that would
> > be cheating.
> 
> Why the hell is it doing a mkdir in the first place? ...and why the hell
> is it not able to cope with EACCES? The latter is hardly an unlikely
> reply: it means that the automounter should not be doing this in the
> first place, 'cos it doesn't have the privileges. That is not the same
> as saying that it doesn't have the privileges to do a lookup.

Why the hell shouldn't it be able to do an mkdir!

It is coping with the EACCESS return by not mounting the filesystem
which is the correct response in this case.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  2:23                     ` Trond Myklebust
@ 2006-09-05  3:01                       ` Ian Kent
  2006-09-05  4:05                         ` Trond Myklebust
  2006-09-05  4:06                       ` Ian Kent
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  3:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 22:23 -0400, Trond Myklebust wrote:
> On Mon, 2006-09-04 at 12:52 +0100, David Howells wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> > > sony:/home/akpm> ls -l /net/bix/usr/src
> > > total 0
> > > 
> > > sony:/home/akpm> showmount -e bix
> > > Export list for bix:
> > > /           *
> > > /usr/src    *
> > > /mnt/export *
> > 
> > Yes, but what's your /etc/exports now?  Not all options appear to showmount.
> > 
> > Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
> > / line if you don't currently have them and try again?
> > 
> > > iirc, we decided this is related to the fs-cache infrastructure work which
> > > went into git-nfs.  I think David can reproduce this?
> > 
> > I'd only reproduced it with SELinux in enforcing mode.
> > 
> > Under such conditions, unless there's a readdir on the root directory, the
> > subdirs under which exports exist will remain as incorrectly negative
> > dentries.
> > 
> > The problem is a conjunction of circumstances:
> > 
> >  (1) nfs_lookup() has a shortcut in it that skips contact with the server if
> >      we're doing a lookup with intent to create.  This leaves an incorrectly
> >      negative dentry if there _is_ actually an object on the server.
> > 
> >  (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
> >      by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
> >      by which the abort can occur.
> > 
> >  (3) One of my patches correctly assigns the security label to the automounted
> >      root dentry.
> > 
> >  (4) SELinux then aborts the automounter's mkdir() call because the automounter
> >      does _not_ carry the correct security label to write to the NFS directory.
> > 
> >  (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
> >      is not invoked to set it right.
> > 
> > The only bit I added was (3), but that's not the only circumstance in which
> > this can occur.
> > 
> > 
> > If, for example, I do "chmod a-w /" on the NFS server, I can see the same
> > effects on the client without the need for SELinux to put its foot in the door.
> > Automount does:
> > 
> > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > 
> > And where I was listing the disputed directory, I see:
> > 
> > 	[root@andromeda ~]# ls -lad /net/trash/usr/src
> > 	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
> > 	[root@andromeda ~]#
> > 
> > which isn't what I'd expect.  What I'd expect is:
> > 
> > 	[root@andromeda ~]# ls -l /net/trash/usr/src
> > 	total 15
> > 	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
> > 	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
> > 	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
> > 	[root@andromeda ~]#
> 
> One way to fix this is to simply not hash the dentry when we're doing
> the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> successfully created the file on the server. Something like the attached
> patch ought to do it.
> 
> Note, though, that this will not fix the autofs problem: autofs is
> trying to perform a totally unnecessary mkdir(), and is giving up when
> it is told that SELinux won't authorise that particular operation. This
> is clearly an autofs bug...

selinux is not involved in this senario.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  2:55                         ` Ian Kent
@ 2006-09-05  3:50                           ` Trond Myklebust
  2006-09-05  4:03                             ` Ian Kent
  2006-09-05  9:48                           ` David Howells
  1 sibling, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  3:50 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 10:55 +0800, Ian Kent wrote:
> On Mon, 2006-09-04 at 21:57 -0400, Trond Myklebust wrote:
> > On Mon, 2006-09-04 at 21:24 +0800, Ian Kent wrote:
> > 
> > > > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > > > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > > > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > > > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > > 
> > > This is the point I'm trying to make.
> > > I'm able to reproduce this with exports that don't have "nohide".
> > > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > > test. It appears to be a change in semantic behavior and I can't see
> > > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > > so doesn't perform the mount. I could ignore the EACCES but that would
> > > be cheating.
> > 
> > Why the hell is it doing a mkdir in the first place? ...and why the hell
> > is it not able to cope with EACCES? The latter is hardly an unlikely
> > reply: it means that the automounter should not be doing this in the
> > first place, 'cos it doesn't have the privileges. That is not the same
> > as saying that it doesn't have the privileges to do a lookup.
> 
> Why the hell shouldn't it be able to do an mkdir!

Firstly, if the call to mkdir actually _was_ successful, it would be
creating a new directory on the NFS server, and it would be doing so
with the automounter's privileges instead of the user's privileges. Why
would I want it to do that?

Secondly, and more pertinently to this case, you have no guarantee that
the automounter has _any_ privileges on the server at all other than
what is required to mount a filesystem. selinux is enforcing that on the
client side here, but the server could just as well be set up to do the
same (in fact, you could set up selinux to do the exact same thing on
the server).

IOW, the automounter should just be calling stat('/net/trash/mnt'). It
shouldn't be trying to create directories on the server at all.

> It is coping with the EACCESS return by not mounting the filesystem
> which is the correct response in this case.

No it isn't. The directory exists. It can be looked up. There is no
reason why you can't mount something on top of it.

Being permitted to do mkdir() or not has nothing to do with anything.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  3:50                           ` Trond Myklebust
@ 2006-09-05  4:03                             ` Ian Kent
  2006-09-05  4:53                               ` Trond Myklebust
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  4:03 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 23:50 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 10:55 +0800, Ian Kent wrote:
> > On Mon, 2006-09-04 at 21:57 -0400, Trond Myklebust wrote:
> > > On Mon, 2006-09-04 at 21:24 +0800, Ian Kent wrote:
> > > 
> > > > > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > > > > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > > > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > > > > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > > > > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > > > 
> > > > This is the point I'm trying to make.
> > > > I'm able to reproduce this with exports that don't have "nohide".
> > > > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > > > test. It appears to be a change in semantic behavior and I can't see
> > > > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > > > so doesn't perform the mount. I could ignore the EACCES but that would
> > > > be cheating.
> > > 
> > > Why the hell is it doing a mkdir in the first place? ...and why the hell
> > > is it not able to cope with EACCES? The latter is hardly an unlikely
> > > reply: it means that the automounter should not be doing this in the
> > > first place, 'cos it doesn't have the privileges. That is not the same
> > > as saying that it doesn't have the privileges to do a lookup.
> > 
> > Why the hell shouldn't it be able to do an mkdir!
> 
> Firstly, if the call to mkdir actually _was_ successful, it would be
> creating a new directory on the NFS server, and it would be doing so
> with the automounter's privileges instead of the user's privileges. Why
> would I want it to do that?
> 
> Secondly, and more pertinently to this case, you have no guarantee that
> the automounter has _any_ privileges on the server at all other than
> what is required to mount a filesystem. selinux is enforcing that on the
> client side here, but the server could just as well be set up to do the
> same (in fact, you could set up selinux to do the exact same thing on
> the server).
> 
> IOW, the automounter should just be calling stat('/net/trash/mnt'). It
> shouldn't be trying to create directories on the server at all.

Sure but this is an old version of autofs which is in use so changing
the expected behavior of a system call is not acceptable and I expect
other applications may well have a problem with this also.

> 
> > It is coping with the EACCESS return by not mounting the filesystem
> > which is the correct response in this case.
> 
> No it isn't. The directory exists. It can be looked up. There is no
> reason why you can't mount something on top of it.
> 
> Being permitted to do mkdir() or not has nothing to do with anything.

Agreed.

The fact that it's a mkdir is irrelevant given that nfs_lookup is
returning an EACCESS instead of EEXIST this will likely affect other
system calls such as "stat". I'll check this.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  3:01                       ` Ian Kent
@ 2006-09-05  4:05                         ` Trond Myklebust
  0 siblings, 0 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  4:05 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 11:01 +0800, Ian Kent wrote:
> On Mon, 2006-09-04 at 22:23 -0400, Trond Myklebust wrote:
> > On Mon, 2006-09-04 at 12:52 +0100, David Howells wrote:
> > > Andrew Morton <akpm@osdl.org> wrote:
> > > 
> > > > sony:/home/akpm> ls -l /net/bix/usr/src
> > > > total 0
> > > > 
> > > > sony:/home/akpm> showmount -e bix
> > > > Export list for bix:
> > > > /           *
> > > > /usr/src    *
> > > > /mnt/export *
> > > 
> > > Yes, but what's your /etc/exports now?  Not all options appear to showmount.
> > > 
> > > Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
> > > / line if you don't currently have them and try again?
> > > 
> > > > iirc, we decided this is related to the fs-cache infrastructure work which
> > > > went into git-nfs.  I think David can reproduce this?
> > > 
> > > I'd only reproduced it with SELinux in enforcing mode.
> > > 
> > > Under such conditions, unless there's a readdir on the root directory, the
> > > subdirs under which exports exist will remain as incorrectly negative
> > > dentries.
> > > 
> > > The problem is a conjunction of circumstances:
> > > 
> > >  (1) nfs_lookup() has a shortcut in it that skips contact with the server if
> > >      we're doing a lookup with intent to create.  This leaves an incorrectly
> > >      negative dentry if there _is_ actually an object on the server.
> > > 
> > >  (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
> > >      by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
> > >      by which the abort can occur.
> > > 
> > >  (3) One of my patches correctly assigns the security label to the automounted
> > >      root dentry.
> > > 
> > >  (4) SELinux then aborts the automounter's mkdir() call because the automounter
> > >      does _not_ carry the correct security label to write to the NFS directory.
> > > 
> > >  (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
> > >      is not invoked to set it right.
> > > 
> > > The only bit I added was (3), but that's not the only circumstance in which
> > > this can occur.
> > > 
> > > 
> > > If, for example, I do "chmod a-w /" on the NFS server, I can see the same
> > > effects on the client without the need for SELinux to put its foot in the door.
> > > Automount does:
> > > 
> > > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > > 
> > > And where I was listing the disputed directory, I see:
> > > 
> > > 	[root@andromeda ~]# ls -lad /net/trash/usr/src
> > > 	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
> > > 	[root@andromeda ~]#
> > > 
> > > which isn't what I'd expect.  What I'd expect is:
> > > 
> > > 	[root@andromeda ~]# ls -l /net/trash/usr/src
> > > 	total 15
> > > 	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
> > > 	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
> > > 	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
> > > 	[root@andromeda ~]#
> > 
> > One way to fix this is to simply not hash the dentry when we're doing
> > the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> > successfully created the file on the server. Something like the attached
> > patch ought to do it.
> > 
> > Note, though, that this will not fix the autofs problem: autofs is
> > trying to perform a totally unnecessary mkdir(), and is giving up when
> > it is told that SELinux won't authorise that particular operation. This
> > is clearly an autofs bug...
> 
> selinux is not involved in this senario.

Right. David rigged the NFS server to produce the same result as selinux
on autofs. He is demonstrating that autofs is wrong to call mkdir().

As I said above, my patch fixes the problem that David noted in (1)
above, namely that a negative dentry in incorrectly instantiated when
the mkdir() (or other exclusive create) operation fails due to a
permission error.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  2:23                     ` Trond Myklebust
  2006-09-05  3:01                       ` Ian Kent
@ 2006-09-05  4:06                       ` Ian Kent
  2006-09-05  4:57                         ` Trond Myklebust
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  4:06 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 22:23 -0400, Trond Myklebust wrote:
> On Mon, 2006-09-04 at 12:52 +0100, David Howells wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> > > sony:/home/akpm> ls -l /net/bix/usr/src
> > > total 0
> > > 
> > > sony:/home/akpm> showmount -e bix
> > > Export list for bix:
> > > /           *
> > > /usr/src    *
> > > /mnt/export *
> > 
> > Yes, but what's your /etc/exports now?  Not all options appear to showmount.
> > 
> > Can you add "nohide" to the /usr/src and /mnt/export lines and "fsid=0" to the
> > / line if you don't currently have them and try again?
> > 
> > > iirc, we decided this is related to the fs-cache infrastructure work which
> > > went into git-nfs.  I think David can reproduce this?
> > 
> > I'd only reproduced it with SELinux in enforcing mode.
> > 
> > Under such conditions, unless there's a readdir on the root directory, the
> > subdirs under which exports exist will remain as incorrectly negative
> > dentries.
> > 
> > The problem is a conjunction of circumstances:
> > 
> >  (1) nfs_lookup() has a shortcut in it that skips contact with the server if
> >      we're doing a lookup with intent to create.  This leaves an incorrectly
> >      negative dentry if there _is_ actually an object on the server.
> > 
> >  (2) The mkdir procedure is aborted between the lookup() op and the mkdir() op
> >      by SELinux (see vfs_mkdir()).  Note that SELinux isn't the _only_ method
> >      by which the abort can occur.
> > 
> >  (3) One of my patches correctly assigns the security label to the automounted
> >      root dentry.
> > 
> >  (4) SELinux then aborts the automounter's mkdir() call because the automounter
> >      does _not_ carry the correct security label to write to the NFS directory.
> > 
> >  (5) The incorrectly set up dentry from (1) remains because the the mkdir() op
> >      is not invoked to set it right.
> > 
> > The only bit I added was (3), but that's not the only circumstance in which
> > this can occur.
> > 
> > 
> > If, for example, I do "chmod a-w /" on the NFS server, I can see the same
> > effects on the client without the need for SELinux to put its foot in the door.
> > Automount does:
> > 
> > [pid  3838] mkdir("/net", 0555)         = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > [pid  3838] mkdir("/net/trash", 0555)   = -1 EEXIST (File exists)
> > [pid  3838] stat64("/net/trash", {st_mode=S_IFDIR|0555, st_size=1024, ...}) = 0
> > [pid  3838] mkdir("/net/trash/mnt", 0555) = -1 EACCES (Permission denied)
> > 
> > And where I was listing the disputed directory, I see:
> > 
> > 	[root@andromeda ~]# ls -lad /net/trash/usr/src
> > 	drwxr-xr-x 4 root root 1024 Aug 30 10:35 /net/trash/usr/src/
> > 	[root@andromeda ~]#
> > 
> > which isn't what I'd expect.  What I'd expect is:
> > 
> > 	[root@andromeda ~]# ls -l /net/trash/usr/src
> > 	total 15
> > 	drwxr-xr-x 3 root root  1024 Aug 30 10:35 debug/
> > 	-rw-r--r-- 1 root root     0 Aug 16 10:01 hello
> > 	drwx------ 2 root root 12288 Aug 16 10:00 lost+found/
> > 	[root@andromeda ~]#
> 
> One way to fix this is to simply not hash the dentry when we're doing
> the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> successfully created the file on the server. Something like the attached
> patch ought to do it.

No.

This patch simply marks the dentry negative and returns ENOMEM from the
lookup which, as would be expected, results in this error being returned
to userspace.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-04 13:46                       ` David Howells
  2006-09-04 15:00                         ` Ian Kent
@ 2006-09-05  4:11                         ` Ian Kent
  2006-09-05  4:17                           ` Trond Myklebust
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  4:11 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Trond Myklebust, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Mon, 2006-09-04 at 14:46 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > This is the point I'm trying to make.
> > I'm able to reproduce this with exports that don't have "nohide".
> > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > test. It appears to be a change in semantic behavior and I can't see
> > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > so doesn't perform the mount. I could ignore the EACCES but that would
> > be cheating.
> 
> Here's something you can try:  Look in fs/nfs/dir.c.  Find nfs_lookup().  In
> there, find the following lines:
> 
> 	/* If we're doing an exclusive create, optimize away the lookup */
> 	if (nfs_is_exclusive_create(dir, nd))
> 		goto no_entry;
> 
> Comment that bit out and see what the effect it.

Yes.

Commenting this out results in the expected behavior.
No doubt this is because the following NFS lookup to the server returns
EEXIST which is then returned to userspace.

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  4:11                         ` Ian Kent
@ 2006-09-05  4:17                           ` Trond Myklebust
  0 siblings, 0 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  4:17 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 12:11 +0800, Ian Kent wrote:
> On Mon, 2006-09-04 at 14:46 +0100, David Howells wrote:
> > Ian Kent <raven@themaw.net> wrote:
> > 
> > > This is the point I'm trying to make.
> > > I'm able to reproduce this with exports that don't have "nohide".
> > > The mkdir used to return EEXIST, possibly before getting to the EACCES
> > > test. It appears to be a change in semantic behavior and I can't see
> > > where it is coming from. autofs expects an EEXIST but not an EACCES and
> > > so doesn't perform the mount. I could ignore the EACCES but that would
> > > be cheating.
> > 
> > Here's something you can try:  Look in fs/nfs/dir.c.  Find nfs_lookup().  In
> > there, find the following lines:
> > 
> > 	/* If we're doing an exclusive create, optimize away the lookup */
> > 	if (nfs_is_exclusive_create(dir, nd))
> > 		goto no_entry;
> > 
> > Comment that bit out and see what the effect it.
> 
> Yes.
> 
> Commenting this out results in the expected behavior.
> No doubt this is because the following NFS lookup to the server returns
> EEXIST which is then returned to userspace.

It also breaks open(O_EXCL). Removing that is unacceptable.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  4:03                             ` Ian Kent
@ 2006-09-05  4:53                               ` Trond Myklebust
  2006-09-05  6:06                                 ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  4:53 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 12:03 +0800, Ian Kent wrote:
> Sure but this is an old version of autofs which is in use so changing
> the expected behavior of a system call is not acceptable and I expect
> other applications may well have a problem with this also.

Applications that rely on mkdir() to never return EACCES are broken.
Particularly so in an selinux system (as was the case here).

Note that an ordinary application will not see this: if I do

Machine 1				Machine 2
---------				---------
mkdir foo
					mkdir foo/bar
					chmod 555 foo/bar foo
mkdir foo/bar
mkdir: cannot create directory 
`foo/bar': File exists

i.e. as expected. So this really only affects applications that are not
supposed to be calling mkdir() in the first place.

> > > It is coping with the EACCESS return by not mounting the filesystem
> > > which is the correct response in this case.
> > 
> > No it isn't. The directory exists. It can be looked up. There is no
> > reason why you can't mount something on top of it.
> > 
> > Being permitted to do mkdir() or not has nothing to do with anything.
> 
> Agreed.
> 
> The fact that it's a mkdir is irrelevant given that nfs_lookup is
> returning an EACCESS instead of EEXIST this will likely affect other
> system calls such as "stat". I'll check this.

In both cases, the call to vfs_mkdir() is returning the EACCES. Not
nfs_lookup. The reason why we are no longer returning EEXIST is because
the intents have changed due to the patch

http://kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a634904a7de0d3a0bc606f608007a34e8c05bfee;hp=ddeff520f02b92128132c282c350fa72afffb84a

I suspect that reverting that patch would 'fix' the autofs bug.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  4:06                       ` Ian Kent
@ 2006-09-05  4:57                         ` Trond Myklebust
  2006-09-05  6:45                           ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05  4:57 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

On Tue, 2006-09-05 at 12:06 +0800, Ian Kent wrote:

> > One way to fix this is to simply not hash the dentry when we're doing
> > the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> > successfully created the file on the server. Something like the attached
> > patch ought to do it.
> 
> No.
> 
> This patch simply marks the dentry negative and returns ENOMEM from the
> lookup which, as would be expected, results in this error being returned
> to userspace.

Oops. You are right. I forgot to set res=NULL...



[-- Attachment #2: linux-2.6.18-063-fix_exclusive_create.dif --]
[-- Type: message/rfc822, Size: 1238 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: 
Message-ID: <1157432220.32412.40.camel@localhost>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/dir.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 51328ae..3419c2d 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -904,9 +904,15 @@ static struct dentry *nfs_lookup(struct 
 
 	lock_kernel();
 
-	/* If we're doing an exclusive create, optimize away the lookup */
-	if (nfs_is_exclusive_create(dir, nd))
-		goto no_entry;
+	/*
+	 * If we're doing an exclusive create, optimize away the lookup
+	 * but don't hash the dentry.
+	 */
+	if (nfs_is_exclusive_create(dir, nd)) {
+		d_instantiate(dentry, NULL);
+		res = NULL;
+		goto out_unlock;
+	}
 
 	error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, &fhandle, &fattr);
 	if (error == -ENOENT)
@@ -1161,6 +1167,8 @@ int nfs_instantiate(struct dentry *dentr
 	if (IS_ERR(inode))
 		return error;
 	d_instantiate(dentry, inode);
+	if (d_unhashed(dentry))
+		d_rehash(dentry);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  4:53                               ` Trond Myklebust
@ 2006-09-05  6:06                                 ` Ian Kent
  2006-09-05  7:01                                   ` Ian Kent
  2006-09-05  9:40                                   ` David Howells
  0 siblings, 2 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-05  6:06 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 00:53 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 12:03 +0800, Ian Kent wrote:
> > Sure but this is an old version of autofs which is in use so changing
> > the expected behavior of a system call is not acceptable and I expect
> > other applications may well have a problem with this also.
> 
> Applications that rely on mkdir() to never return EACCES are broken.
> Particularly so in an selinux system (as was the case here).

That's not quite right.

autofs v4 doesn't rely on mkdir never returning EACCESS just that it
return EEXIST if the directory exists. Never the less if the behavior of
stat will work in this case I'll change v4 to do it the way you suggest
(as v5 does already). 




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  4:57                         ` Trond Myklebust
@ 2006-09-05  6:45                           ` Ian Kent
  2006-09-05  7:07                             ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  6:45 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 00:57 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 12:06 +0800, Ian Kent wrote:
> 
> > > One way to fix this is to simply not hash the dentry when we're doing
> > > the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> > > successfully created the file on the server. Something like the attached
> > > patch ought to do it.
> > 
> > No.
> > 
> > This patch simply marks the dentry negative and returns ENOMEM from the
> > lookup which, as would be expected, results in this error being returned
> > to userspace.
> 
> Oops. You are right. I forgot to set res=NULL...

Now returns EPERM.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  6:06                                 ` Ian Kent
@ 2006-09-05  7:01                                   ` Ian Kent
  2006-09-05 12:52                                     ` Trond Myklebust
  2006-09-05  9:40                                   ` David Howells
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05  7:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 14:06 +0800, Ian Kent wrote:
> On Tue, 2006-09-05 at 00:53 -0400, Trond Myklebust wrote:
> > On Tue, 2006-09-05 at 12:03 +0800, Ian Kent wrote:
> > > Sure but this is an old version of autofs which is in use so changing
> > > the expected behavior of a system call is not acceptable and I expect
> > > other applications may well have a problem with this also.
> > 
> > Applications that rely on mkdir() to never return EACCES are broken.
> > Particularly so in an selinux system (as was the case here).
> 
> That's not quite right.
> 
> autofs v4 doesn't rely on mkdir never returning EACCESS just that it
> return EEXIST if the directory exists. Never the less if the behavior of
> stat will work in this case I'll change v4 to do it the way you suggest
> (as v5 does already). 

Aaah. Wrong again!

Although v5 doesn't attempt to mount an NFS export if the directory
doesn't exist it does end up doing a mkdir later as the most common case
is mounting an NFS export within an autofs filesystem or other, usually
local filesystem.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  6:45                           ` Ian Kent
@ 2006-09-05  7:07                             ` Ian Kent
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-05  7:07 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 14:45 +0800, Ian Kent wrote:
> On Tue, 2006-09-05 at 00:57 -0400, Trond Myklebust wrote:
> > On Tue, 2006-09-05 at 12:06 +0800, Ian Kent wrote:
> > 
> > > > One way to fix this is to simply not hash the dentry when we're doing
> > > > the O_EXCL intent optimisation, but rather to only hash it _after_ we've
> > > > successfully created the file on the server. Something like the attached
> > > > patch ought to do it.
> > > 
> > > No.
> > > 
> > > This patch simply marks the dentry negative and returns ENOMEM from the
> > > lookup which, as would be expected, results in this error being returned
> > > to userspace.
> > 
> > Oops. You are right. I forgot to set res=NULL...
> 
> Now returns EPERM.

Sorry that's EACCES.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  6:06                                 ` Ian Kent
  2006-09-05  7:01                                   ` Ian Kent
@ 2006-09-05  9:40                                   ` David Howells
  2006-09-05 10:20                                     ` Ian Kent
  1 sibling, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-05  9:40 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> autofs v4 doesn't rely on mkdir never returning EACCESS just that it
> return EEXIST if the directory exists. Never the less if the behavior of
> stat will work in this case I'll change v4 to do it the way you suggest
> (as v5 does already). 

As long as you don't rely on stat...mkdir working.  That can go wrong if the
dentry gets booted from the dcache by memory pressure in the "...".

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  2:55                         ` Ian Kent
  2006-09-05  3:50                           ` Trond Myklebust
@ 2006-09-05  9:48                           ` David Howells
  2006-09-05 10:14                             ` Ian Kent
  1 sibling, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-05  9:48 UTC (permalink / raw)
  To: Ian Kent
  Cc: Trond Myklebust, David Howells, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> Why the hell shouldn't it be able to do an mkdir!

The use of mkdir in this manner has to be considered a bug.  You don't know
that the object at that name on the server is a directory.  It might be a
symbolic link.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  1:57                       ` Trond Myklebust
  2006-09-05  2:55                         ` Ian Kent
@ 2006-09-05  9:57                         ` David Howells
  2006-09-05 12:47                           ` Trond Myklebust
  1 sibling, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-05  9:57 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Ian Kent, David Howells, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> Why the hell is it doing a mkdir in the first place?

I think the problems it is solving are these:

 (1) What happens if "/" is _not_ exported?

 (2) What happens if some intermediate directory (say "/usr") is not
     accessible?

In the first case, the automounter just makes "usr" and "usr/src", say, in the
autofs filesystem, and then mounts server:/usr/src on that.

In the second case, the automounter relies on NFS letting it make intervening
directories it couldn't otherwise access to span the gap between "/" and
"src".

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  9:48                           ` David Howells
@ 2006-09-05 10:14                             ` Ian Kent
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-05 10:14 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 10:48 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > Why the hell shouldn't it be able to do an mkdir!
> 
> The use of mkdir in this manner has to be considered a bug.  You don't know
> that the object at that name on the server is a directory.  It might be a
> symbolic link.

Fair call.




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  9:40                                   ` David Howells
@ 2006-09-05 10:20                                     ` Ian Kent
  2006-09-05 10:37                                       ` David Howells
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05 10:20 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 10:40 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > autofs v4 doesn't rely on mkdir never returning EACCESS just that it
> > return EEXIST if the directory exists. Never the less if the behavior of
> > stat will work in this case I'll change v4 to do it the way you suggest
> > (as v5 does already). 
> 
> As long as you don't rely on stat...mkdir working.  That can go wrong if the
> dentry gets booted from the dcache by memory pressure in the "...".

I'm not clear on your point here.

If I stat a path and it exists then all is good and I'm done.
If I stat a path and I get something other than ENOENT then all is bad
and I return fail.
Otherwise I can just attempt to create the directory and fail if all is
bad with that.

This approach works in the current situation and would work for the
other situations in which that same process is used, not just for NFS
filesystems.

> 
> David
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 10:20                                     ` Ian Kent
@ 2006-09-05 10:37                                       ` David Howells
  2006-09-05 12:20                                         ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-05 10:37 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Trond Myklebust, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> > As long as you don't rely on stat...mkdir working.  That can go wrong if the
> > dentry gets booted from the dcache by memory pressure in the "...".
> 
> I'm not clear on your point here.

I was wondering if you were going to rely on stat() forcing the dentry to be
correctly initialised before you did mkdir(), but it seems not.

> If I stat a path and it exists then all is good and I'm done.
> If I stat a path and I get something other than ENOENT then all is bad
> and I return fail.
> Otherwise I can just attempt to create the directory and fail if all is
> bad with that.

Okay, I suppose.  But that still doesn't seem to deal with the case of creating
a directory on the client that then overlays a symlink on the server that you
can't yet access.

You may also get ENOENT because you stat a symlink, though you'll get EEXIST
from mkdir, even if there's nothing at the far end.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 10:37                                       ` David Howells
@ 2006-09-05 12:20                                         ` Ian Kent
  2006-09-05 13:38                                           ` David Howells
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-05 12:20 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 11:37 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > > As long as you don't rely on stat...mkdir working.  That can go wrong if the
> > > dentry gets booted from the dcache by memory pressure in the "...".
> > 
> > I'm not clear on your point here.
> 
> I was wondering if you were going to rely on stat() forcing the dentry to be
> correctly initialised before you did mkdir(), but it seems not.
> 
> > If I stat a path and it exists then all is good and I'm done.
> > If I stat a path and I get something other than ENOENT then all is bad
> > and I return fail.
> > Otherwise I can just attempt to create the directory and fail if all is
> > bad with that.
> 
> Okay, I suppose.  But that still doesn't seem to deal with the case of creating
> a directory on the client that then overlays a symlink on the server that you
> can't yet access.

We're largely performing user space actions at this point.
Wouldn't the subsequent call to mount(8) catch that?

> 
> You may also get ENOENT because you stat a symlink, though you'll get EEXIST
> from mkdir, even if there's nothing at the far end.

Don't think this is something I need to care about either.
I can't mount on a symlink so the error return would be the correct way
to deal with it.

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  9:57                         ` David Howells
@ 2006-09-05 12:47                           ` Trond Myklebust
  2006-09-05 12:53                             ` Trond Myklebust
  2006-09-06 10:27                             ` Ian Kent
  0 siblings, 2 replies; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05 12:47 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 10:57 +0100, David Howells wrote:
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
> > Why the hell is it doing a mkdir in the first place?
> 
> I think the problems it is solving are these:
> 
>  (1) What happens if "/" is _not_ exported?
> 
>  (2) What happens if some intermediate directory (say "/usr") is not
>      accessible?
> 
> 
> In the first case, the automounter just makes "usr" and "usr/src", say, in the
> autofs filesystem, and then mounts server:/usr/src on that.

That is fine. As long as it is doing so in the _autofs_ filesystem. A
call to 'stat()' should suffice to tell if this is the case.

> In the second case, the automounter relies on NFS letting it make intervening
> directories it couldn't otherwise access to span the gap between "/" and
> "src".

If the directory isn't accessible, then autofs shouldn't be trying to
override that. It certainly shouldn't be doing so by trying to create
the directory.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05  7:01                                   ` Ian Kent
@ 2006-09-05 12:52                                     ` Trond Myklebust
  2006-09-06  4:54                                       ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05 12:52 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 15:01 +0800, Ian Kent wrote:

> > autofs v4 doesn't rely on mkdir never returning EACCESS just that it
> > return EEXIST if the directory exists. Never the less if the behavior of
> > stat will work in this case I'll change v4 to do it the way you suggest
> > (as v5 does already). 
> 
> Aaah. Wrong again!
> 
> Although v5 doesn't attempt to mount an NFS export if the directory
> doesn't exist it does end up doing a mkdir later as the most common case
> is mounting an NFS export within an autofs filesystem or other, usually
> local filesystem.

Then make sure that it only does this within the autofs filesystem. The
value of f_type as returned by statfs() should be able to tell you if
this is the case.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 12:47                           ` Trond Myklebust
@ 2006-09-05 12:53                             ` Trond Myklebust
  2006-09-05 13:40                               ` David Howells
  2006-09-06 10:27                             ` Ian Kent
  1 sibling, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-05 12:53 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 08:47 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 10:57 +0100, David Howells wrote:
> > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > 
> > > Why the hell is it doing a mkdir in the first place?
> > 
> > I think the problems it is solving are these:
> > 
> >  (1) What happens if "/" is _not_ exported?
> > 
> >  (2) What happens if some intermediate directory (say "/usr") is not
> >      accessible?
> > 
> > 
> > In the first case, the automounter just makes "usr" and "usr/src", say, in the
> > autofs filesystem, and then mounts server:/usr/src on that.
> 
> That is fine. As long as it is doing so in the _autofs_ filesystem. A
> call to 'stat()' should suffice to tell if this is the case.

I meant statfs().



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 12:20                                         ` Ian Kent
@ 2006-09-05 13:38                                           ` David Howells
  2006-09-06  4:58                                             ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-05 13:38 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Trond Myklebust, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> > Okay, I suppose.  But that still doesn't seem to deal with the case of
> > creating a directory on the client that then overlays a symlink on the
> > server that you can't yet access.
> 
> We're largely performing user space actions at this point.
> Wouldn't the subsequent call to mount(8) catch that?

Not if you've already caused the NFS filesystem to create a "dummy" dentry
that's a directory because you couldn't see that what that name corresponds to
on the server is actually a symlink.

> > You may also get ENOENT because you stat a symlink, though you'll get EEXIST
> > from mkdir, even if there's nothing at the far end.
> 
> Don't think this is something I need to care about either.
> I can't mount on a symlink so the error return would be the correct way
> to deal with it.

But you might have to transit a symlink to reach the mountpoint.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 12:53                             ` Trond Myklebust
@ 2006-09-05 13:40                               ` David Howells
  0 siblings, 0 replies; 70+ messages in thread
From: David Howells @ 2006-09-05 13:40 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Ian Kent, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> > That is fine. As long as it is doing so in the _autofs_ filesystem. A
> > call to 'stat()' should suffice to tell if this is the case.
> 
> I meant statfs().

stat() too: st_dev.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 12:52                                     ` Trond Myklebust
@ 2006-09-06  4:54                                       ` Ian Kent
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-06  4:54 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 08:52 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 15:01 +0800, Ian Kent wrote:
> 
> > > autofs v4 doesn't rely on mkdir never returning EACCESS just that it
> > > return EEXIST if the directory exists. Never the less if the behavior of
> > > stat will work in this case I'll change v4 to do it the way you suggest
> > > (as v5 does already). 
> > 
> > Aaah. Wrong again!
> > 
> > Although v5 doesn't attempt to mount an NFS export if the directory
> > doesn't exist it does end up doing a mkdir later as the most common case
> > is mounting an NFS export within an autofs filesystem or other, usually
> > local filesystem.
> 
> Then make sure that it only does this within the autofs filesystem. The
> value of f_type as returned by statfs() should be able to tell you if
> this is the case.
> 

Yep. I've done that elsewhere.

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 13:38                                           ` David Howells
@ 2006-09-06  4:58                                             ` Ian Kent
  2006-09-06  9:51                                               ` David Howells
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-06  4:58 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 14:38 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > > Okay, I suppose.  But that still doesn't seem to deal with the case of
> > > creating a directory on the client that then overlays a symlink on the
> > > server that you can't yet access.
> > 
> > We're largely performing user space actions at this point.
> > Wouldn't the subsequent call to mount(8) catch that?
> 
> Not if you've already caused the NFS filesystem to create a "dummy" dentry
> that's a directory because you couldn't see that what that name corresponds to
> on the server is actually a symlink.

Shouldn't stat tell me if this is a symlink?

> 
> > > You may also get ENOENT because you stat a symlink, though you'll get EEXIST
> > > from mkdir, even if there's nothing at the far end.
> > 
> > Don't think this is something I need to care about either.
> > I can't mount on a symlink so the error return would be the correct way
> > to deal with it.
> 
> But you might have to transit a symlink to reach the mountpoint.

Mmmm ... thinking ....

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-06  4:58                                             ` Ian Kent
@ 2006-09-06  9:51                                               ` David Howells
  2006-09-06 12:46                                                 ` Trond Myklebust
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-06  9:51 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Trond Myklebust, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Ian Kent <raven@themaw.net> wrote:

> > Not if you've already caused the NFS filesystem to create a "dummy" dentry
> > that's a directory because you couldn't see that what that name
> > corresponds to on the server is actually a symlink.
> 
> Shouldn't stat tell me if this is a symlink?

You may not be able to find out from the server what it is you're trying to
deal with because you may not have permission to do so, or because whatever it
is may not be exported.  The first may be the trickiest to deal with because
the MOUNT service for NFS2 and NFS3 can jump you over bits of the path you
can't otherwise access.

The problem actually comes when the conditions on the server change; perhaps an
intermediate directory is made accessible on the server and suddenly the client
can see inside of it.  It may then find out that what it had assumed to be
directories, and what it had set dummy directory dentries up for, aren't.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-05 12:47                           ` Trond Myklebust
  2006-09-05 12:53                             ` Trond Myklebust
@ 2006-09-06 10:27                             ` Ian Kent
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-06 10:27 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Tue, 2006-09-05 at 08:47 -0400, Trond Myklebust wrote:
> On Tue, 2006-09-05 at 10:57 +0100, David Howells wrote:
> > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > 
> > > Why the hell is it doing a mkdir in the first place?
> > 
> > I think the problems it is solving are these:
> > 
> >  (1) What happens if "/" is _not_ exported?
> > 
> >  (2) What happens if some intermediate directory (say "/usr") is not
> >      accessible?
> > 
> > 
> > In the first case, the automounter just makes "usr" and "usr/src", say, in the
> > autofs filesystem, and then mounts server:/usr/src on that.
> 
> That is fine. As long as it is doing so in the _autofs_ filesystem. A
> call to 'stat()' should suffice to tell if this is the case.
> 
> > In the second case, the automounter relies on NFS letting it make intervening
> > directories it couldn't otherwise access to span the gap between "/" and
> > "src".
> 
> If the directory isn't accessible, then autofs shouldn't be trying to
> override that. It certainly shouldn't be doing so by trying to create
> the directory.
> 

In the case above the directory is in the autofs filesystem and so needs
to be created.

Ian



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-06  9:51                                               ` David Howells
@ 2006-09-06 12:46                                                 ` Trond Myklebust
  2006-09-06 13:24                                                   ` David Howells
  0 siblings, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-06 12:46 UTC (permalink / raw)
  To: David Howells
  Cc: Ian Kent, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Wed, 2006-09-06 at 10:51 +0100, David Howells wrote:
> Ian Kent <raven@themaw.net> wrote:
> 
> > > Not if you've already caused the NFS filesystem to create a "dummy" dentry
> > > that's a directory because you couldn't see that what that name
> > > corresponds to on the server is actually a symlink.
> > 
> > Shouldn't stat tell me if this is a symlink?
> 
> You may not be able to find out from the server what it is you're trying to
> deal with because you may not have permission to do so, or because whatever it
> is may not be exported.  The first may be the trickiest to deal with because
> the MOUNT service for NFS2 and NFS3 can jump you over bits of the path you
> can't otherwise access.
> 
> The problem actually comes when the conditions on the server change; perhaps an
> intermediate directory is made accessible on the server and suddenly the client
> can see inside of it.  It may then find out that what it had assumed to be
> directories, and what it had set dummy directory dentries up for, aren't.

It really doesn't matter whether there is a symlink or not. automounters
should _not_ be trying to create directories on any filesystem other
than the autofs filesystem itself.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-06 12:46                                                 ` Trond Myklebust
@ 2006-09-06 13:24                                                   ` David Howells
  2006-09-07  5:30                                                     ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: David Howells @ 2006-09-06 13:24 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Ian Kent, Andrew Morton, torvalds, steved,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

> It really doesn't matter whether there is a symlink or not. automounters
> should _not_ be trying to create directories on any filesystem other
> than the autofs filesystem itself.

Yes, I agree.

David

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-06 13:24                                                   ` David Howells
@ 2006-09-07  5:30                                                     ` Ian Kent
  2006-09-07  6:17                                                       ` Trond Myklebust
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Kent @ 2006-09-07  5:30 UTC (permalink / raw)
  To: David Howells
  Cc: Trond Myklebust, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Wed, 2006-09-06 at 14:24 +0100, David Howells wrote:
> Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
> > It really doesn't matter whether there is a symlink or not. automounters
> > should _not_ be trying to create directories on any filesystem other
> > than the autofs filesystem itself.
> 
> Yes, I agree.

Not really.

What about multiple recursive bind mounts?
What about the initial directory for the autofs mount itself?

What about the case where a admin expects autofs to create these
directories for map entries that have multiple offsets.

As I've said before in version 5 I'm saying that it is a requirement
that the the directories already exist in this case but in version 4
people may have become accustomed to this behavior and right or wrong
this type of change shouldn't be made without warning to the users or
possibly not made at all.

Ian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-07  5:30                                                     ` Ian Kent
@ 2006-09-07  6:17                                                       ` Trond Myklebust
  2006-09-07  7:40                                                         ` Ian Kent
  0 siblings, 1 reply; 70+ messages in thread
From: Trond Myklebust @ 2006-09-07  6:17 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Thu, 2006-09-07 at 13:30 +0800, Ian Kent wrote:
> On Wed, 2006-09-06 at 14:24 +0100, David Howells wrote:
> > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > 
> > > It really doesn't matter whether there is a symlink or not. automounters
> > > should _not_ be trying to create directories on any filesystem other
> > > than the autofs filesystem itself.
> > 
> > Yes, I agree.
> 
> Not really.
> 
> What about multiple recursive bind mounts?
> What about the initial directory for the autofs mount itself?
> 
> What about the case where a admin expects autofs to create these
> directories for map entries that have multiple offsets.
> 
> As I've said before in version 5 I'm saying that it is a requirement
> that the the directories already exist in this case but in version 4
> people may have become accustomed to this behavior and right or wrong
> this type of change shouldn't be made without warning to the users or
> possibly not made at all.

What part of the phrase "security risk" are you failing to understand?
If anybody out there is actually relying on having an automounter daemon
that is running with root privileges try to create directories on remote
servers on the basis of the output of the 'showmount' command, then they
need saving from themselves.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13]
  2006-09-07  6:17                                                       ` Trond Myklebust
@ 2006-09-07  7:40                                                         ` Ian Kent
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Kent @ 2006-09-07  7:40 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: David Howells, Andrew Morton, torvalds, steved, linux-fsdevel,
	linux-cachefs, nfsv4, linux-kernel

On Thu, 2006-09-07 at 02:17 -0400, Trond Myklebust wrote:
> On Thu, 2006-09-07 at 13:30 +0800, Ian Kent wrote:
> > On Wed, 2006-09-06 at 14:24 +0100, David Howells wrote:
> > > Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > > 
> > > > It really doesn't matter whether there is a symlink or not. automounters
> > > > should _not_ be trying to create directories on any filesystem other
> > > > than the autofs filesystem itself.
> > > 
> > > Yes, I agree.
> > 
> > Not really.
> > 
> > What about multiple recursive bind mounts?
> > What about the initial directory for the autofs mount itself?
> > 
> > What about the case where a admin expects autofs to create these
> > directories for map entries that have multiple offsets.
> > 
> > As I've said before in version 5 I'm saying that it is a requirement
> > that the the directories already exist in this case but in version 4
> > people may have become accustomed to this behavior and right or wrong
> > this type of change shouldn't be made without warning to the users or
> > possibly not made at all.
> 
> What part of the phrase "security risk" are you failing to understand?
> If anybody out there is actually relying on having an automounter daemon
> that is running with root privileges try to create directories on remote
> servers on the basis of the output of the 'showmount' command, then they
> need saving from themselves.
> 

Haha ... point taken.


^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2006-09-07  7:40 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-30 19:31 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #13] David Howells
2006-08-30 19:31 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
2006-08-30 19:32 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
2006-08-30 19:32 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
2006-08-30 19:32 ` [PATCH 5/7] NFS: Use local caching " David Howells
2006-08-30 19:32 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
2006-08-30 19:52 ` [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing " Andrew Morton
2006-08-30 20:37   ` David Howells
2006-08-30 20:55     ` Andrew Morton
2006-08-31  9:58       ` David Howells
2006-08-31 17:21         ` Andrew Morton
2006-08-31 17:26           ` Trond Myklebust
2006-08-31 17:42           ` David Howells
2006-08-31 18:04             ` Andrew Morton
2006-09-01 13:08           ` David Howells
2006-09-01 16:34             ` Andrew Morton
2006-09-01 17:00               ` Trond Myklebust
2006-09-02  2:50                 ` Andrew Morton
2006-09-02  4:11                   ` Ian Kent
2006-09-02  5:58                     ` Andrew Morton
2006-09-03  6:21                       ` Ian Kent
2006-09-03  6:30                         ` Andrew Morton
2006-09-03  6:43                           ` Ian Kent
2006-09-03 16:58                             ` Andrew Morton
2006-09-04  2:23                               ` Ian Kent
2006-09-04  5:40                               ` Ian Kent
2006-09-02  4:49                   ` Ian Kent
2006-09-04 11:52                   ` David Howells
2006-09-04 11:52                   ` David Howells
2006-09-04 13:24                     ` Ian Kent
2006-09-04 13:46                       ` David Howells
2006-09-04 15:00                         ` Ian Kent
2006-09-05  4:11                         ` Ian Kent
2006-09-05  4:17                           ` Trond Myklebust
2006-09-05  1:57                       ` Trond Myklebust
2006-09-05  2:55                         ` Ian Kent
2006-09-05  3:50                           ` Trond Myklebust
2006-09-05  4:03                             ` Ian Kent
2006-09-05  4:53                               ` Trond Myklebust
2006-09-05  6:06                                 ` Ian Kent
2006-09-05  7:01                                   ` Ian Kent
2006-09-05 12:52                                     ` Trond Myklebust
2006-09-06  4:54                                       ` Ian Kent
2006-09-05  9:40                                   ` David Howells
2006-09-05 10:20                                     ` Ian Kent
2006-09-05 10:37                                       ` David Howells
2006-09-05 12:20                                         ` Ian Kent
2006-09-05 13:38                                           ` David Howells
2006-09-06  4:58                                             ` Ian Kent
2006-09-06  9:51                                               ` David Howells
2006-09-06 12:46                                                 ` Trond Myklebust
2006-09-06 13:24                                                   ` David Howells
2006-09-07  5:30                                                     ` Ian Kent
2006-09-07  6:17                                                       ` Trond Myklebust
2006-09-07  7:40                                                         ` Ian Kent
2006-09-05  9:48                           ` David Howells
2006-09-05 10:14                             ` Ian Kent
2006-09-05  9:57                         ` David Howells
2006-09-05 12:47                           ` Trond Myklebust
2006-09-05 12:53                             ` Trond Myklebust
2006-09-05 13:40                               ` David Howells
2006-09-06 10:27                             ` Ian Kent
2006-09-05  2:23                     ` Trond Myklebust
2006-09-05  3:01                       ` Ian Kent
2006-09-05  4:05                         ` Trond Myklebust
2006-09-05  4:06                       ` Ian Kent
2006-09-05  4:57                         ` Trond Myklebust
2006-09-05  6:45                           ` Ian Kent
2006-09-05  7:07                             ` Ian Kent
2006-09-04 18:20               ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox