[PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12]

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12]
@ 2006-08-18 15:35 David Howells
  2006-08-18 15:35 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel



These patches add local caching for network filesystems such as NFS and AFS.

The patches can be grouped as:

 (A) 01-04

     Filesystem caching, including support for AFS.

 (B) 05

     Filesystem caching support for NFS; depends on (A) and upon the superblock
     sharing patches in Trond's tree.

 (C) 06-07

     CacheFiles: cache on files backend; depends on (A).

Note to Andrew Morton: I have not included the 64-bit inode number patches, the
dentry destruction patches or the NFS superblock sharing fix patches in this
patch set.

---
Changes in [try #12]:

 (*) [PATCH] FS-Cache: Release page->private after failed readahead
     [PATCH] FS-Cache: Make kAFS use FS-Cache
     [PATCH] NFS: Use local caching

     [*] Use invalidatepage() rather than releasepage() to forcibly invalidate
     	 a page that we failed to add to the pagecache.

     [*] Make AFS and NFS's releasepage() ops return false if the page is busy
     	 interacting with the cache rather than waiting for the cache
     	 interaction to complete.  invalidatepage() still waits.

 (*) [PATCH] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem

     [*] Fix a printk format warning.

 (*) [PATCH] NFS: Use local caching

     [*] Handle upstream changes to base version of nfs_release_page().

Changes in [try #11]:

 (*) Split up of the NFS superblock sharing patches into a set of smaller
     patches and reworked some of the contents as per Trond's suggestions.

 (*) [PATCH] NFS: Fix error handling

     [*] Fix error handling in earlier patches (the earlier patches are also in
     	 Trond's NFS tree, so I haven't rolled this in for the moment).

 (*) [PATCH] NFS: Secure the roots of the NFS subtrees in a shared superblock

     [*] Initialise the security on detached NFS roots manually since they're
     	 allocated with dcache_alloc_anon() not dcache_alloc_root().

 (*) [PATCH] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem

     [*] Don't use file structs when accessing the data storage backing files.
     	 Pass NULL as the file argument to prepare_write() and commit_write()
     	 calls.

     [*] Check for a bmap() inode op to prevent NFS being used as the cache
     	 backing store (and besides, we need bmap() available anyway).

     [*] Make the calls to the statfs() superblock op supply a dentry not a
     	 vfsmount.

     [*] CONFIG_CACHEFILES_DEBUG permits _enter(), _debug() and _exit() to be
     	 enabled dynamically.

     [*] debugging macros are checked by gcc for printf format compliance even
     	 when completely disabled.

 (*) [PATCH] FS-Cache: CacheFiles: ia64: missing copy_page export

     [*] Export copy_page() on IA-64 as we need that.

 (*) [PATCH] AUTOFS: Make sure all dentries refs are released before calling kill_anon_super()

     [*] Make sure autofs4 releases all its retained dentries in its kill_sb()
     	 op before calling kill_anon_super() rather than in the put_super() op.
     	 This prevents the next patch from oopsing it.

 (*) [PATCH] VFS: Destroy the dentries contributed by a superblock on unmounting

     [*] Optimise the destruction of the dentries attached to a superblock
     	 during unmounting.

Changes in [try #10]:

 (*) [PATCH] NFS: Permit filesystem to perform statfs with a known root dentry

     [*] Pass a dentry rather than a vfsmount to the statfs() op as the key by
     	 which to determine the filesystem.

 (*) [PATCH] NFS: Share NFS superblocks per-protocol per-server per-FSID

     [*] nfs4_pathname_string() needed an extra const.

 (*) [PATCH] FS-Cache: Release page->private in failed readahead

     [*] The comment header on the helper function is much expanded.  This
     	 states why there's a need to call the releasepage() op in the event of
     	 an error.

     [*] BUG() if the page is already locked when we try and lock it.

     [*] Don't set the page mapping pointer until we've locked the page.

     [*] The page is unlocked after try_to_release_page() is called.

 (*) The release-page patch now comes before the fscache-afs patch as well as
     the fscache-nfs patch.

Changes in [try #9]:

 (*) [PATCH] NFS: Permit filesystem to perform statfs with a known root dentry

     [*] Inclusions of linux/mount.h have been added where necessary to make
       	 allyesconfig build successfully.

 (*) [PATCH] NFS: Share NFS superblocks per-protocol per-server per-FSID

     [*] The exports from fs/namespace.c and fs/namei.c are no longer required.

 (*) [PATCH] FS-Cache: Release page->private in failed readahead

     [*] The try_to_release_page() is called instead of calling the
     	 releasepage() op directly.

     [*] The page is locked before try_to_release_page() is called.

     [*] The call to try_to_release_page() and page_cache_release() have been
     	 abstracted out into a helper function as this bit of code occurs
     	 twice..

---
In response to those who've asked, there are at least three reasons for
implementing superblock sharing:

 (1) As I understand what I've been told, NFSv4 requires a 1:1 mapping between
     server files and client files.  I suspect this has to do with the
     management of leases.

 (2) We can reduce the resource consumption on NFSv2 and NFSv3 clients as well
     as on NFSv4 clients by sharing superblocks that cover overlapping segments
     of the file space.

     Consider a machine that's used by a lot of people at the same time, each
     of whom has an automounted NFS homedir off of the same server - and in
     fact off of the same disk on the that server.  Currently, with Linus's
     tree, each one will get a separate superblock to represent them; with
     Trond's tree, each one will still get a separate superblock unless they
     share the same root filehandle; and with my patches, they'll get the same
     superblock.

     If two homedirs have a hard link between them (unlikely, I know, but by no
     means impossible, and probably more likely with, say, data such as NFS
     mounted git repositories), then you have the possibility of aliasing.
     This means that you can have two or more inodes in core that refer to the
     same server object, and each of these inodes can have pages that refer to
     the same remote pages on the server - aliasing again.  You _have_ to have
     two inodes because they're covered by separate superblocks.

     Aliasing is bad, generally, because you end up using more storage than
     you need to (pagecache and inode cache in this case), and you have the
     problem of keeping them in sync.  It's also twice as hard to keep two
     inodes up to date when they change on the server as to keep one up to
     date.

     If you can use the same superblock where possible, then you can cut out
     aliasing on that client since you can share dentries that have the same
     file handle (hard links or subtrees).

     Part of the problem with NFSv2 and NFSv3 is that you invoke mountd to get
     the filehandle to a subtree, but you may not be able to work out how two
     different subtrees relate.  The getsb patch permits the superblock to
     have more than one root, which allows us to defer this problem until we
     see the root of one subtree cropping up in another subtree - at which
     point we can splice the former into the latter.

 (3) In my local file caching patches (FS-Cache), I have two reasons for
     wanting this:

     (a) Unique keys.  I need a unique key to find an object in the cache.  If
     	 we can get inode aliases, then I end up with several inodes referring
     	 to the same cache object.  This also means that I have to use a fair
     	 bit of extra memory to keep track of the multiple cookie mappings in
     	 FS-Cache, and have to compare keys a lot to find duplicate mappings.

	 If I can assume that the _netfs_ will manage the 1:1 mapping, I can
	 use a lot less memory and save some processing capacity also.

	 I don't want to invent random keys to differentiate aliased
	 superblocks or inodes as that destroys the persistence capabilities
	 of the cache across power failures and reboots.

     (b) Callbacks.  I want a callback that the netfs passes to FS-Cache to
     	 permit the cache to update the metadata in the cache from netfs
     	 metadata at convenient times.  However, if there's more than one
     	 inode alias in core, which one should the cache use?

AFS doesn't have anything like these problems because mounts are always made
from the root of a volume, and AFS was designed with local caching in mind.

The getsb and statfs patches are a consequence of NFS being permitted to mount
arbitrary subtrees from the server.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:35 ` [PATCH 2/7] FS-Cache: Generic filesystem caching facility " David Howells
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch provides a filesystem-specific page bit that a filesystem
can synchronise upon.  This can be used, for example, by a netfs to synchronise
with CacheFS writing its pages to disk.

The PG_checked bit is replaced with PG_fs_misc, and various operations are
provided based upon that.  The *PageChecked() macros have also been replaced.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/afs/dir.c               |    5 +----
 fs/ext2/dir.c              |    6 +++---
 fs/ext3/inode.c            |   10 +++++-----
 fs/freevxfs/vxfs_subr.c    |    2 +-
 fs/reiserfs/inode.c        |   10 +++++-----
 fs/ufs/dir.c               |    6 +++---
 include/linux/page-flags.h |   15 ++++++++++-----
 include/linux/pagemap.h    |   11 +++++++++++
 mm/filemap.c               |   17 +++++++++++++++++
 mm/migrate.c               |    4 ++--
 mm/page_alloc.c            |    2 +-
 11 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index cf8a2cb..f1c965f 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -155,11 +155,9 @@ #endif
 		}
 	}
 
-	SetPageChecked(page);
 	return;
 
  error:
-	SetPageChecked(page);
 	SetPageError(page);
 
 } /* end afs_dir_check_page() */
@@ -191,8 +189,7 @@ static struct page *afs_dir_get_page(str
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
-			afs_dir_check_page(dir, page);
+		afs_dir_check_page(dir, page);
 		if (PageError(page))
 			goto fail;
 	}
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 92ea826..b0e7d9f 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -112,7 +112,7 @@ static void ext2_check_page(struct page 
 	if (offs != limit)
 		goto Eend;
 out:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return;
 
 	/* Too bad, we had an error */
@@ -152,7 +152,7 @@ Eend:
 		dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs,
 		(unsigned long) le32_to_cpu(p->inode));
 fail:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	SetPageError(page);
 }
 
@@ -165,7 +165,7 @@ static struct page * ext2_get_page(struc
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
+		if (!PageFsMisc(page))
 			ext2_check_page(page);
 		if (PageError(page))
 			goto fail;
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index c5ee9f0..81ebf9b 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1527,12 +1527,12 @@ static int ext3_journalled_writepage(str
 		goto no_write;
 	}
 
-	if (!page_has_buffers(page) || PageChecked(page)) {
+	if (!page_has_buffers(page) || PageFsMisc(page)) {
 		/*
 		 * It's mmapped pagecache.  Add buffers and journal it.  There
 		 * doesn't seem much point in redirtying the page here.
 		 */
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 		ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
 					ext3_get_block);
 		if (ret != 0) {
@@ -1589,7 +1589,7 @@ static void ext3_invalidatepage(struct p
 	 * If it's a full truncate we just forget about the pending dirtying
 	 */
 	if (offset == 0)
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 
 	journal_invalidatepage(journal, page, offset);
 }
@@ -1598,7 +1598,7 @@ static int ext3_releasepage(struct page 
 {
 	journal_t *journal = EXT3_JOURNAL(page->mapping->host);
 
-	WARN_ON(PageChecked(page));
+	WARN_ON(PageFsMisc(page));
 	if (!page_has_buffers(page))
 		return 0;
 	return journal_try_to_free_buffers(journal, page, wait);
@@ -1694,7 +1694,7 @@ out:
  */
 static int ext3_journalled_set_page_dirty(struct page *page)
 {
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return __set_page_dirty_nobuffers(page);
 }
 
diff --git a/fs/freevxfs/vxfs_subr.c b/fs/freevxfs/vxfs_subr.c
index decac62..805bbb2 100644
--- a/fs/freevxfs/vxfs_subr.c
+++ b/fs/freevxfs/vxfs_subr.c
@@ -78,7 +78,7 @@ vxfs_get_page(struct address_space *mapp
 		kmap(pp);
 		if (!PageUptodate(pp))
 			goto fail;
-		/** if (!PageChecked(pp)) **/
+		/** if (!PageFsMisc(pp)) **/
 			/** vxfs_check_page(pp); **/
 		if (PageError(pp))
 			goto fail;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 52f1e21..af4dd36 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2344,7 +2344,7 @@ static int reiserfs_write_full_page(stru
 	struct buffer_head *head, *bh;
 	int partial = 0;
 	int nr = 0;
-	int checked = PageChecked(page);
+	int checked = PageFsMisc(page);
 	struct reiserfs_transaction_handle th;
 	struct super_block *s = inode->i_sb;
 	int bh_per_page = PAGE_CACHE_SIZE / s->s_blocksize;
@@ -2422,7 +2422,7 @@ static int reiserfs_write_full_page(stru
 	 * blocks we're going to log
 	 */
 	if (checked) {
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 		reiserfs_write_lock(s);
 		error = journal_begin(&th, s, bh_per_page + 1);
 		if (error) {
@@ -2803,7 +2803,7 @@ static void reiserfs_invalidatepage(stru
 	BUG_ON(!PageLocked(page));
 
 	if (offset == 0)
-		ClearPageChecked(page);
+		ClearPageFsMisc(page);
 
 	if (!page_has_buffers(page))
 		goto out;
@@ -2844,7 +2844,7 @@ static int reiserfs_set_page_dirty(struc
 {
 	struct inode *inode = page->mapping->host;
 	if (reiserfs_file_data_log(inode)) {
-		SetPageChecked(page);
+		SetPageFsMisc(page);
 		return __set_page_dirty_nobuffers(page);
 	}
 	return __set_page_dirty_buffers(page);
@@ -2867,7 +2867,7 @@ static int reiserfs_releasepage(struct p
 	struct buffer_head *bh;
 	int ret = 1;
 
-	WARN_ON(PageChecked(page));
+	WARN_ON(PageFsMisc(page));
 	spin_lock(&j->j_dirty_buffers_lock);
 	head = page_buffers(page);
 	bh = head;
diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 7f0a0aa..e04327c 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -135,7 +135,7 @@ static void ufs_check_page(struct page *
 	if (offs != limit)
 		goto Eend;
 out:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	return;
 
 	/* Too bad, we had an error */
@@ -173,7 +173,7 @@ Eend:
 		   "offset=%lu",
 		   dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs);
 fail:
-	SetPageChecked(page);
+	SetPageFsMisc(page);
 	SetPageError(page);
 }
 
@@ -187,7 +187,7 @@ static struct page *ufs_get_page(struct 
 		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
-		if (!PageChecked(page))
+		if (!PageFsMisc(page))
 			ufs_check_page(page);
 		if (PageError(page))
 			goto fail;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5748642..6e017b7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -71,7 +71,7 @@ #define PG_lru			 5
 #define PG_active		 6
 #define PG_slab			 7	/* slab debug (Suparna wants this) */
 
-#define PG_checked		 8	/* kill me in 2.5.<early>. */
+#define PG_fs_misc		 8
 #define PG_arch_1		 9
 #define PG_reserved		10
 #define PG_private		11	/* Has something at ->private */
@@ -161,10 +161,6 @@ #else
 #define PageHighMem(page)	0 /* needed to optimize away at compile time */
 #endif
 
-#define PageChecked(page)	test_bit(PG_checked, &(page)->flags)
-#define SetPageChecked(page)	set_bit(PG_checked, &(page)->flags)
-#define ClearPageChecked(page)	clear_bit(PG_checked, &(page)->flags)
-
 #define PageReserved(page)	test_bit(PG_reserved, &(page)->flags)
 #define SetPageReserved(page)	set_bit(PG_reserved, &(page)->flags)
 #define ClearPageReserved(page)	clear_bit(PG_reserved, &(page)->flags)
@@ -263,4 +259,13 @@ static inline void set_page_writeback(st
 	test_set_page_writeback(page);
 }
 
+/*
+ * Filesystem-specific page bit testing
+ */
+#define PageFsMisc(page)		test_bit(PG_fs_misc, &(page)->flags)
+#define SetPageFsMisc(page)		set_bit(PG_fs_misc, &(page)->flags)
+#define TestSetPageFsMisc(page)		test_and_set_bit(PG_fs_misc, &(page)->flags)
+#define ClearPageFsMisc(page)		clear_bit(PG_fs_misc, &(page)->flags)
+#define TestClearPageFsMisc(page)	test_and_clear_bit(PG_fs_misc, &(page)->flags)
+
 #endif	/* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0a2f5d2..82b2753 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -170,6 +170,17 @@ static inline void wait_on_page_writebac
 extern void end_page_writeback(struct page *page);
 
 /*
+ * Wait for filesystem-specific page synchronisation to complete
+ */
+static inline void wait_on_page_fs_misc(struct page *page)
+{
+	if (PageFsMisc(page))
+		wait_on_page_bit(page, PG_fs_misc);
+}
+
+extern void fastcall end_page_fs_misc(struct page *page);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index b9a60c4..2e365d4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -577,6 +577,23 @@ void fastcall __lock_page(struct page *p
 }
 EXPORT_SYMBOL(__lock_page);
 
+/*
+ * Note completion of filesystem specific page synchronisation
+ *
+ * This is used to allow a page to be written to a filesystem cache in the
+ * background without holding up the completion of readpage
+ */
+void fastcall end_page_fs_misc(struct page *page)
+{
+	smp_mb__before_clear_bit();
+	if (!TestClearPageFsMisc(page))
+		BUG();
+	smp_mb__after_clear_bit();
+	__wake_up_bit(page_waitqueue(page), &page->flags, PG_fs_misc);
+}
+
+EXPORT_SYMBOL(end_page_fs_misc);
+
 /**
  * find_get_page - find and get a page reference
  * @mapping: the address_space to search
diff --git a/mm/migrate.c b/mm/migrate.c
index 3f1e0c2..08c9fff 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -348,8 +348,8 @@ static void migrate_page_copy(struct pag
 		SetPageUptodate(newpage);
 	if (PageActive(page))
 		SetPageActive(newpage);
-	if (PageChecked(page))
-		SetPageChecked(newpage);
+	if (PageFsMisc(page))
+		SetPageFsMisc(newpage);
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 54a4f53..8bb003b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -550,7 +550,7 @@ static int prep_new_page(struct page *pa
 
 	page->flags &= ~(1 << PG_uptodate | 1 << PG_error |
 			1 << PG_referenced | 1 << PG_arch_1 |
-			1 << PG_checked | 1 << PG_mappedtodisk);
+			1 << PG_fs_misc | 1 << PG_mappedtodisk);
 	set_page_private(page, 0);
 	set_page_refcounted(page);
 	kernel_map_pages(page, 1 << order, 1);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] FS-Cache: Generic filesystem caching facility [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
  2006-08-18 15:35 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:35 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch adds a generic intermediary (FS-Cache) by which filesystems
may call on local caching capabilities, and by which local caching backends may
make caches available:

	+---------+
	|         |                        +--------------+
	|   NFS   |--+                     |              |
	|         |  |                 +-->|   CacheFS    |
	+---------+  |   +----------+  |   |  /dev/hda5   |
	             |   |          |  |   +--------------+
	+---------+  +-->|          |  |
	|         |      |          |--+
	|   AFS   |----->| FS-Cache |
	|         |      |          |--+
	+---------+  +-->|          |  |
	             |   |          |  |   +--------------+
	+---------+  |   +----------+  |   |              |
	|         |  |                 +-->|  CacheFiles  |
	|  ISOFS  |--+                     |  /var/cache  |
	|         |                        +--------------+
	+---------+

The patch also documents the netfs interface and the cache backend
interface provided by the facility.


There are a number of reasons why I'm not using i_mapping to do this.
These have been discussed a lot on the LKML and CacheFS mailing lists,
but to summarise the basics:

 (1) Most filesystems don't do hole reportage.  Holes in files are treated as
     blocks of zeros and can't be distinguished otherwise, making it difficult
     to distinguish blocks that have been read from the network and cached from
     those that haven't.

 (2) The backing inode must be fully populated before being exposed to
     userspace through the main inode because the VM/VFS goes directly to the
     backing inode and does not interrogate the front inode on VM ops.

     Therefore:

     (a) The backing inode must fit entirely within the cache.

     (b) All backed files currently open must fit entirely within the cache at
     	 the same time.

     (c) A working set of files in total larger than the cache may not be
     	 cached.

     (d) A file may not grow larger than the available space in the cache.

     (e) A file that's open and cached, and remotely grows larger than the
     	 cache is potentially stuffed.

 (3) Writes go to the backing filesystem, and can only be transferred to the
     network when the file is closed.

 (4) There's no record of what changes have been made, so the whole file must
     be written back.

 (5) The pages belong to the backing filesystem, and all metadata associated
     with that page are relevant only to the backing filesystem, and not
     anything stacked atop it.


The attached patch adds a generic core to which both networking filesystems and
caches may bind.  It transfers requests from networking filesystems to
appropriate caches if possible, or else gracefully denies them.

If this facility is disabled in the kernel configuration, then all its
operations will be trivially reducible to nothing by the compiler.

FS-Cache provides the following facilities:

 (1) Caches can be added / removed at any time, even whilst in use.

 (2) Adds a facility by which tags can be used to refer to caches, even if
     they're not mounted yet.

 (3) More than one cache can be used at once.  Caches can be selected
     explicitly by use of tags.

 (4) The netfs is provided with an interface that allows either party to
     withdraw caching facilities from a file (required for (1)).

 (5) A netfs may annotate cache objects that belongs to it.

 (6) Cache objects can be pinned and reservations made.

 (7) The interface to the netfs returns as few errors as possible, preferring
     rather to let the netfs remain oblivious.

 (8) Cookies are used to represent indices, files and other objects to the
     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
     cached there.

 (9) The netfs is allowed to propose - dynamically - any index hierarchy it
     desires, though it must be aware that the index search function is
     recursive, stack space is limited, and indices can only be children of
     indices.

(10) Indices can be used to group files together to reduce key size and to make
     group invalidation easier.  The use of indices may make lookup quicker,
     but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages.  The
     netfs indicates that page A is at index B of the data-file represented by
     cookie C, and that it should be read or written.  The cache backend may or
     may not start I/O on that page, but if it does, a netfs callback will be
     invoked to indicate completion.  The I/O may be either synchronous or
     asynchronous.

(12) Cookies can be "retired" upon release.  At this point FS-Cache will mark
     them as obsolete and the index hierarchy rooted at that point will get
     recycled.

(13) The netfs provides a "match" function for index searches.  In addition to
     saying whether a match was made or not, this can also specify that an
     entry should be updated or deleted.


FS-Cache maintains a virtual indexing tree in which all indices, files, objects
and pages are kept.  Bits of this tree may actually reside in one or more
caches.

                                           FSDEF
                                             |
                        +------------------------------------+
                        |                                    |
                       NFS                                  AFS
                        |                                    |
           +--------------------------+                +-----------+
           |                          |                |           |
        homedir                     mirror          afs.org   redhat.com
           |                          |                            |
     +------------+           +---------------+              +----------+
     |            |           |               |              |          |
   00001        00002       00007           00125        vol00001   vol00002
     |            |           |               |                         |
 +---+---+     +-----+      +---+      +------+------+            +-----+----+
 |   |   |     |     |      |   |      |      |      |            |     |    |
PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
                     |                                            |
                    PG0                                       +-------+
                                                              |       |
                                                            00001   00003
                                                              |
                                                          +---+---+
                                                          |   |   |
                                                         PG0 PG1 PG2

In the example above, you can see two netfs's being backed: NFS and AFS.  These
have different index hierarchies:

 (*) The NFS primary index will probably contain per-server indices.  Each
     server index is indexed by NFS file handles to get data file objects.
     Each data file objects can have an array of pages, but may also have
     further child objects, such as extended attributes and directory entries.
     Extended attribute objects themselves have page-array contents.

 (*) The AFS primary index contains per-cell indices.  Each cell index contains
     per-logical-volume indices.  Each of volume index contains up to three
     indices for the read-write, read-only and backup mirrors of those volumes.
     Each of these contains vnode data file objects, each of which contains an
     array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children.  Any index with non-index object children will be assumed to only
reside in one cache.


The FS-Cache overview can be found in:

	Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

	Documentation/filesystems/caching/netfs-api.txt

The cache backend API to FS-Cache can be found in:

	Documentation/filesystems/caching/backend-api.txt


Further changes [try #11] that have been made:

 (*) FSCACHE_NEGATIVE_COOKIE has been removed.  NULL should be used instead.

 (*) Add read/write context maintenance for in case the cache operation fails.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/caching/backend-api.txt |  357 +++++++
 Documentation/filesystems/caching/fscache.txt     |  151 +++
 Documentation/filesystems/caching/netfs-api.txt   |  752 +++++++++++++++
 fs/Kconfig                                        |   15 
 fs/Makefile                                       |    1 
 fs/fscache/Makefile                               |   11 
 fs/fscache/cookie.c                               | 1045 +++++++++++++++++++++
 fs/fscache/fscache-int.h                          |   93 ++
 fs/fscache/fsdef.c                                |  110 ++
 fs/fscache/main.c                                 |  105 ++
 fs/fscache/page.c                                 |  537 +++++++++++
 include/linux/fscache-cache.h                     |  243 +++++
 include/linux/fscache.h                           |  496 ++++++++++
 13 files changed, 3916 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
new file mode 100644
index 0000000..1dd601b
--- /dev/null
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -0,0 +1,357 @@
+			  ==========================
+			  FS-CACHE CACHE BACKEND API
+			  ==========================
+
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties.
+
+This API is declared in <linux/fscache-cache.h>.
+
+
+====================================
+INITIALISING AND REGISTERING A CACHE
+====================================
+
+To start off, a cache definition must be initialised and registered for each
+cache the backend wants to make available.  For instance, CacheFS does this in
+the fill_super() operation on mounting.
+
+The cache definition (struct fscache_cache) should be initialised by calling:
+
+	void fscache_init_cache(struct fscache_cache *cache,
+				struct fscache_cache_ops *ops,
+				const char *idfmt,
+				...)
+
+Where:
+
+ (*) "cache" is a pointer to the cache definition;
+
+ (*) "ops" is a pointer to the table of operations that the backend supports on
+     this cache;
+
+ (*) and a format and printf-style arguments for constructing a label for the
+     cache.
+
+
+The cache should then be registered with FS-Cache by passing a pointer to the
+previously initialised cache definition to:
+
+	int fscache_add_cache(struct fscache_cache *cache,
+			      struct fscache_object *fsdef,
+			      const char *tagname);
+
+Two extra arguments should also be supplied:
+
+ (*) "fsdef" which should point to the object representation for the FS-Cache
+     master index in this cache.  Netfs primary index entries will be created
+     here.
+
+ (*) "tagname" which, if given, should be a text string naming this cache.  If
+     this is NULL, the identifier will be used instead.  For CacheFS, the
+     identifier is set to name the underlying block device and the tag can be
+     supplied by mount.
+
+This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
+is already in use.  0 will be returned on success.
+
+
+=====================
+UNREGISTERING A CACHE
+=====================
+
+A cache can be withdrawn from the system by calling this function with a
+pointer to the cache definition:
+
+	void fscache_withdraw_cache(struct fscache_cache *cache)
+
+In CacheFS's case, this is called by put_super().
+
+
+==================
+FS-CACHE UTILITIES
+==================
+
+FS-Cache provides some utilities that a cache backend may make use of:
+
+ (*) Find the parent of an object:
+
+	struct fscache_object *
+	fscache_find_parent_object(struct fscache_object *object)
+
+     This allows a backend to find the logical parent of an index or data file
+     in the cache hierarchy.
+
+ (*) Note occurrence of an I/O error in a cache:
+
+	void fscache_io_error(struct fscache_cache *cache)
+
+     This tells FS-Cache that an I/O error occurred in the cache.  After this
+     has been called, only resource dissociation operations (object and page
+     release) will be passed from the netfs to the cache backend for the
+     specified cache.
+
+     This does not actually withdraw the cache.  That must be done separately.
+
+ (*) Get an extra reference to a read or write context:
+
+	void *fscache_get_context(struct fscache_cookie *cookie, void *context)
+
+     and release a reference:
+
+	void *fscache_put_context(struct fscache_cookie *cookie, void *context)
+
+     These should be used to maintain the presence of the read or write context
+     passed to the cache read/write functions.  This context must then be
+     passed to the I/O completion function.
+
+
+========================
+RELEVANT DATA STRUCTURES
+========================
+
+ (*) Index/Data file FS-Cache representation cookie:
+
+	struct fscache_cookie {
+		struct fscache_object_def	*def;
+		struct fscache_netfs		*netfs;
+		void				*netfs_data;
+		...
+	};
+
+     The fields that might be of use to the backend describe the object
+     definition, the netfs definition and the netfs's data for this cookie.
+     The object definition contain functions supplied by the netfs for loading
+     and matching index entries; these are required to provide some of the
+     cache operations.
+
+ (*) In-cache object representation:
+
+	struct fscache_object {
+		struct fscache_cache		*cache;
+		struct fscache_cookie		*cookie;
+		unsigned long			flags;
+	#define FSCACHE_OBJECT_RECYCLING	1
+		...
+	};
+
+     Structures of this type should be allocated by the cache backend and
+     passed to FS-Cache when requested by the appropriate cache operation.  In
+     the case of CacheFS, they're embedded in CacheFS's internal object
+     structures.
+
+     Each object contains a pointer to the cookie that represents the object it
+     is backing.  It also contains a flag that indicates whether the object is
+     being retired when put_object() is called.  This should be initialised by
+     calling fscache_object_init(object).
+
+
+================
+CACHE OPERATIONS
+================
+
+The cache backend provides FS-Cache with a table of operations that can be
+performed on the denizens of the cache.  These are held in a structure of type:
+
+	struct fscache_cache_ops
+
+ (*) Name of cache provider [mandatory]:
+
+	const char *name
+
+     This isn't strictly an operation, but should be pointed at a string naming
+     the backend.
+
+ (*) Object lookup [mandatory]:
+
+	struct fscache_object *(*lookup_object)(struct fscache_cache *cache,
+						struct fscache_object *parent,
+						struct fscache_cookie *cookie)
+
+     This method is used to look up an object in the specified cache, given a
+     pointer to the parent object and the cookie to which the object will be
+     attached.  This should instantiate that object in the cache if it can, or
+     return -ENOBUFS or -ENOMEM if it can't.
+
+ (*) Increment object refcount [mandatory]:
+
+	struct fscache_object *(*grab_object)(struct fscache_object *object)
+
+     This method is called to increment the reference count on an object.  It
+     may fail (for instance if the cache is being withdrawn) by returning NULL.
+     It should return the object pointer if successful.
+
+ (*) Lock/Unlock object [mandatory]:
+
+	void (*lock_object)(struct fscache_object *object)
+	void (*unlock_object)(struct fscache_object *object)
+
+     These methods are used to exclusively lock an object.  It must be possible
+     to schedule with the lock held, so a spinlock isn't sufficient.
+
+ (*) Pin/Unpin object [optional]:
+
+	int (*pin_object)(struct fscache_object *object)
+	void (*unpin_object)(struct fscache_object *object)
+
+     These methods are used to pin an object into the cache.  Once pinned an
+     object cannot be reclaimed to make space.  Return -ENOSPC if there's not
+     enough space in the cache to permit this.
+
+ (*) Update object [mandatory]:
+
+	int (*update_object)(struct fscache_object *object)
+
+     This is called to update the index entry for the specified object.  The
+     new information should be in object->cookie->netfs_data.  This can be
+     obtained by calling object->cookie->def->get_aux()/get_attr().
+
+ (*) Release object reference [mandatory]:
+
+	void (*put_object)(struct fscache_object *object)
+
+     This method is used to discard a reference to an object.  The object may
+     be destroyed when all the references held by FS-Cache are released.
+
+ (*) Synchronise a cache [mandatory]:
+
+	void (*sync)(struct fscache_cache *cache)
+
+     This is called to ask the backend to synchronise a cache with its backing
+     device.
+
+ (*) Dissociate a cache [mandatory]:
+
+	void (*dissociate_pages)(struct fscache_cache *cache)
+
+     This is called to ask a cache to perform any page dissociations as part of
+     cache withdrawal.
+
+ (*) Set the data size on a cache file [mandatory]:
+
+	int (*set_i_size)(struct fscache_object *object, loff_t i_size);
+
+     This is called to indicate to the cache the maximum size a file may reach.
+     The cache may use this to reserve space on the cache.  It may also return
+     -ENOBUFS to indicate that insufficient space is available to expand the
+     metadata used to track the data.  It should return 0 if successful or
+     -ENOMEM or -EIO on error.
+
+ (*) Reserve cache space for an object's data [optional]:
+
+	int (*reserve_space)(struct fscache_object *object, loff_t size);
+
+     This is called to request that cache space be reserved to hold the data
+     for an object and the metadata used to track it.  Zero size should be
+     taken as request to cancel a reservation.
+
+     This should return 0 if successful, -ENOSPC if there isn't enough space
+     available, or -ENOMEM or -EIO on other errors.
+
+     The reservation may exceed the size of the object, thus permitting future
+     expansion.  If the amount of space consumed by an object would exceed the
+     reservation, it's permitted to refuse requests to allocate pages, but not
+     required.  An object may be pruned down to its reservation size if larger
+     than that already.
+
+ (*) Request page be read from cache [mandatory]:
+
+	int (*read_or_alloc_page)(struct fscache_object *object,
+				  struct page *page,
+				  fscache_rw_complete_t end_io_func,
+				  void *end_io_data,
+				  gfp_t gfp)
+
+     This is called to attempt to read a netfs page from the cache, or to
+     reserve a backing block if not.  FS-Cache will have done as much checking
+     as it can before calling, but most of the work belongs to the backend.
+
+     If there's no page in the cache, then -ENODATA should be returned if the
+     backend managed to reserve a backing block; -ENOBUFS, -ENOMEM or -EIO if
+     it didn't.
+
+     If there is a page in the cache, then a read operation should be queued
+     and 0 returned.  When the read finishes, end_io_func() should be called
+     with the following arguments:
+
+	(*end_io_func)(object->cookie->netfs_data,
+		       page,
+		       end_io_data,
+		       error);
+
+     The mark_pages_cached() cookie operation should be called for the page if
+     any cache metadata is retained.  This will indicate to the netfs that the
+     page needs explicit uncaching.  This operation takes a pagevec, thus
+     allowing several pages to be marked at once.
+
+ (*) Request pages be read from cache [mandatory]:
+
+	int (*read_or_alloc_pages)(struct fscache_object *object,
+				   struct address_space *mapping,
+				   struct list_head *pages,
+				   unsigned *nr_pages,
+				   fscache_rw_complete_t end_io_func,
+				   void *end_io_data,
+				   gfp_t gfp)
+
+     This is like the previous operation, except it will be handed a list of
+     pages instead of one page.  Any pages on which a read operation is started
+     must be added to the page cache for the specified mapping and also to the
+     LRU.  Such pages must also be removed from the pages list and nr_pages
+     decremented per page.
+
+     If there was an error such as -ENOMEM, then that should be returned; else
+     if one or more pages couldn't be read or allocated, then -ENOBUFS should
+     be returned; else if one or more pages couldn't be read, then -ENODATA
+     should be returned.  If all the pages are dispatched then 0 should be
+     returned.
+
+ (*) Request page be allocated in the cache [mandatory]:
+
+	int (*allocate_page)(struct fscache_object *object,
+			     struct page *page,
+			     gfp_t gfp)
+
+     This is like read_or_alloc_page(), except that it shouldn't read from the
+     cache, even if there's data there that could be retrieved.  It should,
+     however, set up any internal metadata required such that write_page() can
+     write to the cache.
+
+     If there's no backing block available, then -ENOBUFS should be returned
+     (or -ENOMEM or -EIO if there were other problems).  If a block is
+     successfully allocated, then the netfs page should be marked and 0
+     returned.
+
+ (*) Request page be written to cache [mandatory]:
+
+	int (*write_page)(struct fscache_object *object,
+			  struct page *page,
+			  fscache_rw_complete_t end_io_func,
+			  void *end_io_data,
+			  gfp_t gfp)
+
+     This is called to write from a page on which there was a previously
+     successful read_or_alloc_page() call.  FS-Cache filters out pages that
+     don't have mappings.
+
+     If there's no backing block available, then -ENOBUFS should be returned
+     (or -ENOMEM or -EIO if there were other problems).
+
+     If the write operation could be queued, then 0 should be returned.  When
+     the write completes, end_io_func() should be called with the following
+     arguments:
+
+	(*end_io_func)(object->cookie->netfs_data,
+		       page,
+		       end_io_data,
+		       error);
+
+ (*) Discard retained per-page metadata [mandatory]:
+
+	void (*uncache_pages)(struct fscache_object *object,
+			      struct pagevec *pagevec)
+
+     This is called when one or more netfs pages are being evicted from the
+     pagecache.  The cache backend should tear down any internal representation
+     or tracking it maintains.
diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
new file mode 100644
index 0000000..82c3168
--- /dev/null
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -0,0 +1,151 @@
+			  ==========================
+			  General Filesystem Caching
+			  ==========================
+
+========
+OVERVIEW
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems:
+
+	+---------+
+	|         |                        +--------------+
+	|   NFS   |--+                     |              |
+	|         |  |                 +-->|   CacheFS    |
+	+---------+  |   +----------+  |   |  /dev/hda5   |
+	             |   |          |  |   +--------------+
+	+---------+  +-->|          |  |
+	|         |      |          |--+
+	|   AFS   |----->| FS-Cache |
+	|         |      |          |--+
+	+---------+  +-->|          |  |
+	             |   |          |  |   +--------------+
+	+---------+  |   +----------+  |   |              |
+	|         |  |                 +-->|  CacheFiles  |
+	|  ISOFS  |--+                     |  /var/cache  |
+	|         |                        +--------------+
+	+---------+
+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+     cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+     must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+     one-off access of a small portion of it (such as might be done with the
+     "file" program).
+
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+FS-Cache provides the following facilities:
+
+ (1) More than one cache can be used at once.  Caches can be selected
+     explicitly by use of tags.
+
+ (2) Caches can be added / removed at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+     withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+     rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent indices, files and other objects to the
+     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
+     cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+     desires, though it must be aware that the index search function is
+     recursive, stack space is limited, and indices can only be children of
+     indices.
+
+ (7) Data I/O is done direct to and from the netfs's pages.  The netfs
+     indicates that page A is at index B of the data-file represented by cookie
+     C, and that it should be read or written.  The cache backend may or may
+     not start I/O on that page, but if it does, a netfs callback will be
+     invoked to indicate completion.  The I/O may be either synchronous or
+     asynchronous.
+
+ (8) Cookies can be "retired" upon release.  At this point FS-Cache will mark
+     them as obsolete and the index hierarchy rooted at that point will get
+     recycled.
+
+ (9) The netfs provides a "match" function for index searches.  In addition to
+     saying whether a match was made or not, this can also specify that an
+     entry should be updated or deleted.
+
+
+FS-Cache maintains a virtual indexing tree in which all indices, files, objects
+and pages are kept.  Bits of this tree may actually reside in one or more
+caches.
+
+                                           FSDEF
+                                             |
+                        +------------------------------------+
+                        |                                    |
+                       NFS                                  AFS
+                        |                                    |
+           +--------------------------+                +-----------+
+           |                          |                |           |
+        homedir                     mirror          afs.org   redhat.com
+           |                          |                            |
+     +------------+           +---------------+              +----------+
+     |            |           |               |              |          |
+   00001        00002       00007           00125        vol00001   vol00002
+     |            |           |               |                         |
+ +---+---+     +-----+      +---+      +------+------+            +-----+----+
+ |   |   |     |     |      |   |      |      |      |            |     |    |
+PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
+                     |                                            |
+                    PG0                                       +-------+
+                                                              |       |
+                                                            00001   00003
+                                                              |
+                                                          +---+---+
+                                                          |   |   |
+                                                         PG0 PG1 PG2
+
+In the example above, you can see two netfs's being backed: NFS and AFS.  These
+have different index hierarchies:
+
+ (*) The NFS primary index contains per-server indices.  Each server index is
+     indexed by NFS file handles to get data file objects.  Each data file
+     objects can have an array of pages, but may also have further child
+     objects, such as extended attributes and directory entries.  Extended
+     attribute objects themselves have page-array contents.
+
+ (*) The AFS primary index contains per-cell indices.  Each cell index contains
+     per-logical-volume indices.  Each of volume index contains up to three
+     indices for the read-write, read-only and backup mirrors of those volumes.
+     Each of these contains vnode data file objects, each of which contains an
+     array of pages.
+
+The very top index is the FS-Cache master index in which individual netfs's
+have entries.
+
+Any index object may reside in more than one cache, provided it only has index
+children.  Any index with non-index object children will be assumed to only
+reside in one cache.
+
+
+The netfs API to FS-Cache can be found in:
+
+	Documentation/filesystems/caching/netfs-api.txt
+
+The cache backend API to FS-Cache can be found in:
+
+	Documentation/filesystems/caching/backend-api.txt
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
new file mode 100644
index 0000000..a1a182b
--- /dev/null
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -0,0 +1,752 @@
+			===============================
+			FS-CACHE NETWORK FILESYSTEM API
+			===============================
+
+There's an API by which a network filesystem can make use of the FS-Cache
+facilities.  This is based around a number of principles:
+
+ (1) Caches can store a number of different object types.  There are two main
+     object types: indices and files.  The first is a special type used by
+     FS-Cache to make finding objects faster and to make retiring of groups of
+     objects easier.
+
+ (2) Every index, file or other object is represented by a cookie.  This cookie
+     may or may not have anything associated with it, but the netfs doesn't
+     need to care.
+
+ (3) Barring the top-level index (one entry per cached netfs), the index
+     hierarchy for each netfs is structured according the whim of the netfs.
+
+This API is declared in <linux/fscache.h>.
+
+This document contains the following sections:
+
+	 (1) Network filesystem definition
+	 (2) Index definition
+	 (3) Object definition
+	 (4) Network filesystem (un)registration
+	 (5) Cache tag lookup
+	 (6) Index registration
+	 (7) Data file registration
+	 (8) Miscellaneous object registration
+	 (9) Setting the data file size
+	(10) Page alloc/read/write
+	(11) Page uncaching
+	(12) Index and data file update
+	(13) Miscellaneous cookie operations
+	(14) Cookie unregistration
+	(15) Index and data file invalidation
+
+
+=============================
+NETWORK FILESYSTEM DEFINITION
+=============================
+
+FS-Cache needs a description of the network filesystem.  This is specified
+using a record of the following structure:
+
+	struct fscache_netfs {
+		uint32_t			version;
+		const char			*name;
+		struct fscache_netfs_operations	*ops;
+		struct fscache_cookie		*primary_index;
+		...
+	};
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+     entire in-cache hierarchy for this netfs will be scrapped and begun
+     afresh).
+
+ (3) The operations table is defined as follows:
+
+	struct fscache_netfs_operations {
+	};
+
+     Currently there aren't any functions here.
+
+ (4) The cookie representing the primary index will be allocated according to
+     another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+	static struct fscache_netfs_operations afs_cache_ops = {
+	};
+
+	struct fscache_netfs afs_cache_netfs = {
+		.version	= 0,
+		.name		= "afs",
+		.ops		= &afs_cache_ops,
+	};
+
+
+================
+INDEX DEFINITION
+================
+
+Indices are used for two purposes:
+
+ (1) To aid the finding of a file based on a series of keys (such as AFS's
+     "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+     a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, FS-Cache tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree.  The netfs can even mix indices and data files at the same level, but
+it's not recommended.
+
+Each index entry consists of a key of indeterminate length plus some auxilliary
+data, also of indeterminate length.
+
+There are some limits on indices:
+
+ (1) Any index containing non-index objects should be restricted to a single
+     cache.  Any such objects created within an index will be created in the
+     first cache only.  The cache in which an index is created can be
+     controlled by cache tags (see below).
+
+ (2) The entry data must be atomically journallable, so it is limited to about
+     400 bytes at present.  At least 400 bytes will be available.
+
+ (3) The depth of the index tree should be judged with care as the search
+     function is recursive.  Too many layers will run the kernel out of stack.
+
+
+=================
+OBJECT DEFINITION
+=================
+
+To define an object, a structure of the following type should be filled out:
+
+	struct fscache_object_def
+	{
+		uint8_t name[16];
+		uint8_t type;
+
+		struct fscache_cache_tag *(*select_cache)(
+			const void *parent_netfs_data,
+			const void *cookie_netfs_data);
+
+		uint16_t (*get_key)(const void *cookie_netfs_data,
+				    void *buffer,
+				    uint16_t bufmax);
+
+		void (*get_attr)(const void *cookie_netfs_data,
+				 uint64_t *size);
+
+		uint16_t (*get_aux)(const void *cookie_netfs_data,
+				    void *buffer,
+				    uint16_t bufmax);
+
+		fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+						const void *data,
+						uint16_t datalen);
+
+		void (*get_context)(void *cookie_netfs_data, void *context);
+
+		void (*put_context)(void *cookie_netfs_data, void *context);
+
+		void (*mark_pages_cached)(void *cookie_netfs_data,
+					  struct address_space *mapping,
+					  struct pagevec *cached_pvec);
+
+		void (*now_uncached)(void *cookie_netfs_data);
+	};
+
+This has the following fields:
+
+ (1) The type of the object [mandatory].
+
+     This is one of the following values:
+
+	(*) FSCACHE_COOKIE_TYPE_INDEX
+
+	    This defines an index, which is a special FS-Cache type.
+
+	(*) FSCACHE_COOKIE_TYPE_DATAFILE
+
+	    This defines an ordinary data file.
+
+	(*) Any other value between 2 and 255
+
+	    This defines an extraordinary object such as an XATTR.
+
+ (2) The name of the object type (NUL terminated unless all 16 chars are used)
+     [optional].
+
+ (3) A function to select the cache in which to store an index [optional].
+
+     This function is invoked when an index needs to be instantiated in a cache
+     during the instantiation of a non-index object.  Only the immediate index
+     parent for the non-index object will be queried.  Any indices above that
+     in the hierarchy may be stored in multiple caches.  This function does not
+     need to be supplied for any non-index object or any index that will only
+     have index children.
+
+     If this function is not supplied or if it returns NULL then the first
+     cache in the parent's list will be chosed, or failing that, the first
+     cache in the master list.
+
+ (4) A function to retrieve an object's key from the netfs [mandatory].
+
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function and the maximum length of key data that it may
+     provide.  It should write the required key data into the given buffer and
+     return the quantity it wrote.
+
+ (5) A function to retrieve attribute data from the netfs [optional].
+
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function.  It should return the size of the file if
+     this is a data file.  The size may be used to govern how much cache must
+     be reserved for this file in the cache.
+
+     If the function is absent, a file size of 0 is assumed.
+
+ (6) A function to retrieve auxilliary data from the netfs [optional].
+
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function and the maximum length of auxilliary data that
+     it may provide.  It should write the auxilliary data into the given buffer
+     and return the quantity it wrote.
+
+     If this function is absent, the auxilliary data length will be set to 0.
+
+     The length of the auxilliary data buffer may be dependent on the key
+     length.  A netfs mustn't rely on being able to provide more than 400 bytes
+     for both.
+
+ (7) A function to check the auxilliary data [optional].
+
+     This function will be called to check that a match found in the cache for
+     this object is valid.  For instance with AFS it could check the auxilliary
+     data against the data version number returned by the server to determine
+     whether the index entry in a cache is still valid.
+
+     If this function is absent, it will be assumed that matching objects in a
+     cache are always valid.
+
+     If present, the function should return one of the following values:
+
+	(*) FSCACHE_CHECKAUX_OKAY		- the entry is okay as is
+	(*) FSCACHE_CHECKAUX_NEEDS_UPDATE	- the entry requires update
+	(*) FSCACHE_CHECKAUX_OBSOLETE		- the entry should be deleted
+
+     This function can also be used to extract data from the auxilliary data in
+     the cache and copy it into the netfs's structures.
+
+ (8) A pair of functions to manage contexts for the completion callback
+     [optional].
+
+     The cache read/write functions are passed a context which is then passed
+     to the I/O completion callback function.  To ensure this context remains
+     valid until after the I/O completion is called, two functions may be
+     provided: one to get an extra reference on the context, and one to drop a
+     reference to it.
+
+     If the context is not used or is a type of object that won't go out of
+     scope, then these functions are not required.  These functions are not
+     required for indices as indices may not contain data.  These functions may
+     be called in interrupt context and so may not sleep.
+
+ (9) A function to mark a page as retaining cache metadata [mandatory].
+
+     This is called by the cache to indicate that it is retaining in-memory
+     information for this page and that the netfs should uncache the page when
+     it has finished.  This does not indicate whether there's data on the disk
+     or not.  Note that several pages at once may be presented for marking.
+
+     kAFS and NFS use the PG_private bit on the page structure for this, but
+     that may not be appropriate in all cases.
+
+     This function is not required for indices as they're not permitted data.
+
+(10) A function to unmark all the pages retaining cache metadata [mandatory].
+
+     This is called by FS-Cache to indicate that a backing store is being
+     unbound from a cookie and that all the marks on the pages should be
+     cleared to prevent confusion.  Note that the cache will have torn down all
+     its tracking information so that the pages don't need to be explicitly
+     uncached.
+
+     This function is not required for indices as they're not permitted data.
+
+
+===================================
+NETWORK FILESYSTEM (UN)REGISTRATION
+===================================
+
+The first step is to declare the network filesystem to the cache.  This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+	int fscache_register_netfs(struct fscache_netfs *netfs);
+
+It just takes a pointer to the netfs definition.  It returns 0 or an error as
+appropriate.
+
+For kAFS, registration is done as follows:
+
+	ret = fscache_register_netfs(&afs_cache_netfs);
+
+The last step is, of course, unregistration:
+
+	void fscache_unregister_netfs(struct fscache_netfs *netfs);
+
+
+================
+CACHE TAG LOOKUP
+================
+
+FS-Cache permits the use of more than one cache.  To permit particular index
+subtrees to be bound to particular caches, the second step is to look up cache
+representation tags.  This step is optional; it can be left entirely up to
+FS-Cache as to which cache should be used.  The problem with doing that is that
+FS-Cache will always pick the first cache that was registered.
+
+To get the representation for a named tag:
+
+	struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
+
+This takes a text string as the name and returns a representation of a tag.  It
+will never return an error.  It may return a dummy tag, however, if it runs out
+of memory; this will inhibit caching with this tag.
+
+Any representation so obtained must be released by passing it to this function:
+
+	void fscache_release_cache_tag(struct fscache_cache_tag *tag);
+
+The tag will be retrieved by FS-Cache when it calls the object definition
+operation select_cache().
+
+
+==================
+INDEX REGISTRATION
+==================
+
+The third step is to inform FS-Cache about part of an index hierarchy that can
+be used to locate files.  This is done by requesting a cookie for each index in
+the path to the file:
+
+	struct fscache_cookie *
+	fscache_acquire_cookie(struct fscache_cookie *parent,
+			       struct fscache_object_def *def,
+			       void *netfs_data);
+
+This function creates an index entry in the index represented by parent,
+filling in the index entry by calling the operations pointed to by def.
+
+Note that this function never returns an error - all errors are handled
+internally.  It may also return NULL to indicate no cookie.  It is quite
+acceptable to pass this token back to this function as the parent to another
+acquisition (or even to the relinquish cookie, read page and write page
+functions - see below).
+
+Note also that no indices are actually created in a cache until a non-index
+object needs to be created somewhere down the hierarchy.  Furthermore, an index
+may be created in several different caches independently at different times.
+This is all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index.  This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+	cell->cache =
+		fscache_acquire_cookie(afs_cache_netfs.primary_index,
+				       &afs_cell_cache_index_def,
+				       cell);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+	vlocation->cache =
+		fscache_acquire_cookie(cell->cache,
+				       &afs_vlocation_cache_index_def,
+				       vlocation);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+	volume->cache =
+		fscache_acquire_cookie(vlocation->cache,
+				       &afs_volume_cache_index_def,
+				       volume);
+
+
+======================
+DATA FILE REGISTRATION
+======================
+
+The fourth step is to request a data file be created in the cache.  This is
+identical to index cookie acquisition.  The only difference is that the type in
+the object definition should be something other than index type.
+
+	vnode->cache =
+		fscache_acquire_cookie(volume->cache,
+				       &afs_vnode_cache_object_def,
+				       vnode);
+
+
+=================================
+MISCELLANEOUS OBJECT REGISTRATION
+=================================
+
+An optional step is to request an object of miscellaneous type be created in
+the cache.  This is almost identical to index cookie acquisition.  The only
+difference is that the type in the object definition should be something other
+than index type.  Whilst the parent object could be an index, it's more likely
+it would be some other type of object such as a data file.
+
+	xattr->cache =
+		fscache_acquire_cookie(vnode->cache,
+				       &afs_xattr_cache_object_def,
+				       xattr);
+
+Miscellaneous objects might be used to store extended attributes or directory
+entries for example.
+
+
+==========================
+SETTING THE DATA FILE SIZE
+==========================
+
+The fifth step is to set the size of the file.  This doesn't automatically
+reserve any space in the cache, but permits the cache to adjust its metadata
+for data tracking appropriately:
+
+	int fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size);
+
+The cache will return -ENOBUFS if there is no backing cache or if there is no
+space to allocate any extra metadata required in the cache.
+
+Note that attempts to read or write data pages in the cache over this size may
+be rebuffed with -ENOBUFS.
+
+
+=====================
+PAGE READ/ALLOC/WRITE
+=====================
+
+And the sixth step is to store and retrieve pages in the cache.  There are
+three functions that are used to do this.
+
+Note:
+
+ (1) A page should not be re-read or re-allocated without uncaching it first.
+
+ (2) A read or allocated page must be uncached when the netfs page is released
+     from the pagecache.
+
+ (3) A page should only be written to the cache if previous read or allocated.
+
+This permits the cache to maintain its page tracking in proper order.
+
+
+PAGE READ
+---------
+
+Firstly, the netfs should ask FS-Cache to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+	typedef
+	void (*fscache_rw_complete_t)(struct page *page,
+				      void *context,
+				      int error);
+
+	int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+				       struct page *page,
+				       fscache_rw_complete_t end_io_func,
+				       void *end_io_data,
+				       gfp_t gfp);
+
+The cookie argument must specify a cookie for an object that isn't an index,
+the page specified will have the data loaded into it (and is also used to
+specify the page number), and the gfp argument is used to control how any
+memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident in the cache:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) The function will submit a request to read the data from the cache's
+     backing device directly into the page specified.
+
+ (3) The function will return 0.
+
+ (4) When the read is complete, end_io_func() will be invoked with:
+
+     (*) The netfs data supplied when the cookie was created.
+
+     (*) The page descriptor.
+
+     (*) The context argument passed to the above function.  This will be
+     	 maintained with the get_context/put_context functions mentioned above.
+
+     (*) An argument that's 0 on success or negative for an error code.
+
+     If an error occurs, it should be assumed that the page contains no usable
+     data.
+
+     end_io_func() will be called in process context if the read is results in
+     an error, but it might be called in interrupt context if the read is
+     successful.
+
+Otherwise, if there's not a copy available in cache, but the cache may be able
+to store the page:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) A block may be reserved in the cache and attached to the object at the
+     appropriate place.
+
+ (3) The function will return -ENODATA.
+
+This function may also return -ENOMEM or -EINTR, in which case it won't have
+read any data from the cache.
+
+
+PAGE ALLOCATE
+-------------
+
+Alternatively, if there's not expected to be any data in the cache for a page
+because the file has been extended, a block can simply be allocated instead:
+
+	int fscache_alloc_page(struct fscache_cookie *cookie,
+			       struct page *page,
+			       gfp_t gfp);
+
+This is similar to the fscache_read_or_alloc_page() function, except that it
+never reads from the cache.  It will return 0 if a block has been allocated,
+rather than -ENODATA as the other would.  One or the other must be performed
+before writing to the cache.
+
+The mark_pages_cached() cookie operation will be called on the page if
+successful.
+
+
+PAGE WRITE
+----------
+
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+
+	int fscache_write_page(struct fscache_cookie *cookie,
+			       struct page *page,
+			       fscache_rw_complete_t end_io_func,
+			       void *context,
+			       gfp_t gfp);
+
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+
+The page must have first been read or allocated successfully and must not have
+been uncached before writing is performed.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if space can be allocated in the cache to hold this page:
+
+ (1) The function will submit a request to write the data to cache's backing
+     device directly from the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the write is complete the end_io_func() will be invoked with:
+
+     (*) The netfs data supplied when the cookie was created.
+
+     (*) The page descriptor.
+
+     (*) The context argument passed to the function.  This will be maintained
+     	 with the get_context/put_context functions mentioned above.
+
+     (*) An argument that's 0 on success or negative for an error.
+
+     If an error occurs, it can be assumed that the page has not been written
+     to the cache, and that either there's a block containing the old data or
+     no block at all in the cache.
+
+     end_io_func() might be called in interrupt context.
+
+Else if there's no space available in the cache, -ENOBUFS will be returned.
+
+
+MULTIPLE PAGE READ
+------------------
+
+A facility is provided to read several pages at once, as requested by the
+readpages() address space operation:
+
+	int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+					struct address_space *mapping,
+					struct list_head *pages,
+					int *nr_pages,
+					fscache_rw_complete_t end_io_func,
+					void *context,
+					gfp_t gfp);
+
+This works in a similar way to fscache_read_or_alloc_page(), except:
+
+ (1) Any page it can retrieve data for is removed from pages and nr_pages and
+     dispatched for reading to the disk.  Reads of adjacent pages on disk may
+     be merged for greater efficiency.
+
+ (2) The mark_pages_cached() cookie operation will be called on several pages
+     at once if they're being read or allocated.
+
+ (3) If there was an general error, then that error will be returned.
+
+     Else if some pages couldn't be allocated or read, then -ENOBUFS will be
+     returned.
+
+     Else if some pages couldn't be read but were allocated, then -ENODATA will
+     be returned.
+
+     Otherwise, if all pages had reads dispatched, then 0 will be returned, the
+     list will be empty and *nr_pages will be 0.
+
+ (4) end_io_func will be called once for each page being read as the reads
+     complete.  It will be called in process context if error != 0, but it may
+     be called in interrupt context if there is no error.
+
+Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
+some of the pages being read and some being allocated.  Those pages will have
+been marked appropriately and will need uncaching.
+
+
+==============
+PAGE UNCACHING
+==============
+
+To uncache a page, this function should be called:
+
+	void fscache_uncache_page(struct fscache_cookie *cookie,
+				  struct page *page);
+
+This function permits the cache to release any in-memory representation it
+might be holding for this netfs page.  This function must be called once for
+each page on which the read or write page functions above have been called to
+make sure the cache's in-memory tracking information gets torn down.
+
+Note that pages can't be explicitly deleted from the a data file.  The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+There is another unbinding operation similar to the above that takes a set of
+pages to unbind in one go:
+
+	void fscache_uncache_pagevec(struct fscache_cookie *cookie,
+				     struct pagevec *pagevec);
+
+
+==========================
+INDEX AND DATA FILE UPDATE
+==========================
+
+To request an update of the index data for an index or other object, the
+following function should be called:
+
+	void fscache_update_cookie(struct fscache_cookie *cookie);
+
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry.  The update method in the parent index definition will be called to
+transfer the data.
+
+Note that partial updates may happen automatically at other times, such as when
+data blocks are added to a data file object.
+
+
+===============================
+MISCELLANEOUS COOKIE OPERATIONS
+===============================
+
+There are a number of operations that can be used to control cookies:
+
+ (*) Cookie pinning:
+
+	int fscache_pin_cookie(struct fscache_cookie *cookie);
+	void fscache_unpin_cookie(struct fscache_cookie *cookie);
+
+     These operations permit data cookies to be pinned into the cache and to
+     have the pinning removed.  They are not permitted on index cookies.
+
+     The pinning function will return 0 if successful, -ENOBUFS in the cookie
+     isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
+     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+     -EIO if there's any other problem.
+
+ (*) Data space reservation:
+
+	int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+
+     This permits a netfs to request cache space be reserved to store up to the
+     given amount of a file.  It is permitted to ask for more than the current
+     size of the file to allow for future file expansion.
+
+     If size is given as zero then the reservation will be cancelled.
+
+     The function will return 0 if successful, -ENOBUFS in the cookie isn't
+     backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
+     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+     -EIO if there's any other problem.
+
+     Note that this doesn't pin an object in a cache; it can still be culled to
+     make space if it's not in use.
+
+
+=====================
+COOKIE UNREGISTRATION
+=====================
+
+To get rid of a cookie, this function should be called.
+
+	void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+				       int retire);
+
+If retire is non-zero, then the object will be marked for recycling, and all
+copies of it will be removed from all active caches in which it is present.
+Not only that but all child objects will also be retired.
+
+If retire is zero, then the object may be available again when next the
+acquisition function is called.  Retirement here will overrule the pinning on a
+cookie.
+
+One very important note - relinquish must NOT be called for a cookie unless all
+the cookies for "child" indices, objects and pages have been relinquished
+first.
+
+
+================================
+INDEX AND DATA FILE INVALIDATION
+================================
+
+There is no direct way to invalidate an index subtree or a data file.  To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
diff --git a/fs/Kconfig b/fs/Kconfig
index 3f00a9f..1a3d179 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -530,6 +530,21 @@ config FUSE_FS
 	  If you want to develop a userspace FS, or if you want to use
 	  a filesystem based on FUSE, answer Y or M.
 
+menu "Caches"
+
+config FSCACHE
+	tristate "General filesystem cache manager"
+	depends on EXPERIMENTAL
+	help
+	  This option enables a generic filesystem caching manager that can be
+	  used by various network and other filesystems to cache data
+	  locally. Different sorts of caches can be plugged in, depending on the
+	  resources available.
+
+	  See Documentation/filesystems/caching/fscache.txt for more information.
+
+endmenu
+
 menu "CD-ROM/DVD Filesystems"
 
 config ISO9660_FS
diff --git a/fs/Makefile b/fs/Makefile
index 8913542..0b0e827 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -52,6 +52,7 @@ obj-y				+= devpts/
 obj-$(CONFIG_PROFILING)		+= dcookies.o
  
 # Do not add any filesystems before this line
+obj-$(CONFIG_FSCACHE)		+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
 obj-$(CONFIG_EXT3_FS)		+= ext3/ # Before ext2 so root fs can be ext3
 obj-$(CONFIG_JBD)		+= jbd/
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
new file mode 100644
index 0000000..10f17a3
--- /dev/null
+++ b/fs/fscache/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for general filesystem caching code
+#
+
+fscache-objs := \
+	cookie.o \
+	fsdef.o \
+	main.o \
+	page.o
+
+obj-$(CONFIG_FSCACHE) := fscache.o
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
new file mode 100644
index 0000000..9a6ac8b
--- /dev/null
+++ b/fs/fscache/cookie.c
@@ -0,0 +1,1045 @@
+/* cookie.c: general filesystem cache cookie management
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include "fscache-int.h"
+
+static LIST_HEAD(fscache_cache_tag_list);
+static LIST_HEAD(fscache_cache_list);
+static LIST_HEAD(fscache_netfs_list);
+static DECLARE_RWSEM(fscache_addremove_sem);
+static struct fscache_cache_tag fscache_nomem_tag;
+
+kmem_cache_t *fscache_cookie_jar;
+
+static void fscache_withdraw_object(struct fscache_cache *cache,
+				    struct fscache_object *object);
+
+static void __fscache_cookie_put(struct fscache_cookie *cookie);
+
+static inline void fscache_cookie_put(struct fscache_cookie *cookie)
+{
+	/* check to see whether the cookie has already been released by looking
+	 * for the poison when slab debugging is on */
+#ifdef CONFIG_DEBUG_SLAB
+	BUG_ON((atomic_read(&cookie->usage) & 0xffff0000) == 0x6b6b0000);
+#endif
+
+	BUG_ON(atomic_read(&cookie->usage) <= 0);
+
+	if (atomic_dec_and_test(&cookie->usage))
+		__fscache_cookie_put(cookie);
+
+}
+
+/*****************************************************************************/
+/*
+ * look up a cache tag
+ */
+struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
+{
+	struct fscache_cache_tag *tag, *xtag;
+
+	/* firstly check for the existence of the tag under read lock */
+	down_read(&fscache_addremove_sem);
+
+	list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+		if (strcmp(tag->name, name) == 0) {
+			atomic_inc(&tag->usage);
+			up_read(&fscache_addremove_sem);
+			return tag;
+		}
+	}
+
+	up_read(&fscache_addremove_sem);
+
+	/* the tag does not exist - create a candidate */
+	xtag = kmalloc(sizeof(*xtag) + strlen(name) + 1, GFP_KERNEL);
+	if (!xtag)
+		/* return a dummy tag if out of memory */
+		return &fscache_nomem_tag;
+
+	atomic_set(&xtag->usage, 1);
+	strcpy(xtag->name, name);
+
+	/* write lock, search again and add if still not present */
+	down_write(&fscache_addremove_sem);
+
+	list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+		if (strcmp(tag->name, name) == 0) {
+			atomic_inc(&tag->usage);
+			up_write(&fscache_addremove_sem);
+			kfree(xtag);
+			return tag;
+		}
+	}
+
+	list_add_tail(&xtag->link, &fscache_cache_tag_list);
+	up_write(&fscache_addremove_sem);
+	return xtag;
+}
+
+/*****************************************************************************/
+/*
+ * release a reference to a cache tag
+ */
+void __fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+	if (tag != &fscache_nomem_tag) {
+		down_write(&fscache_addremove_sem);
+
+		if (atomic_dec_and_test(&tag->usage))
+			list_del_init(&tag->link);
+		else
+			tag = NULL;
+
+		up_write(&fscache_addremove_sem);
+
+		kfree(tag);
+	}
+}
+
+/*****************************************************************************/
+/*
+ * register a network filesystem for caching
+ */
+int __fscache_register_netfs(struct fscache_netfs *netfs)
+{
+	struct fscache_netfs *ptr;
+	int ret;
+
+	_enter("{%s}", netfs->name);
+
+	INIT_LIST_HEAD(&netfs->link);
+
+	/* allocate a cookie for the primary index */
+	netfs->primary_index =
+		kmem_cache_zalloc(fscache_cookie_jar, SLAB_KERNEL);
+
+	if (!netfs->primary_index) {
+		_leave(" = -ENOMEM");
+		return -ENOMEM;
+	}
+
+	/* initialise the primary index cookie */
+	atomic_set(&netfs->primary_index->usage, 1);
+	atomic_set(&netfs->primary_index->children, 0);
+
+	netfs->primary_index->def		= &fscache_fsdef_netfs_def;
+	netfs->primary_index->parent		= &fscache_fsdef_index;
+	netfs->primary_index->netfs		= netfs;
+	netfs->primary_index->netfs_data	= netfs;
+
+	atomic_inc(&netfs->primary_index->parent->usage);
+	atomic_inc(&netfs->primary_index->parent->children);
+
+	init_rwsem(&netfs->primary_index->sem);
+	INIT_HLIST_HEAD(&netfs->primary_index->backing_objects);
+
+	/* check the netfs type is not already present */
+	down_write(&fscache_addremove_sem);
+
+	ret = -EEXIST;
+	list_for_each_entry(ptr, &fscache_netfs_list, link) {
+		if (strcmp(ptr->name, netfs->name) == 0)
+			goto already_registered;
+	}
+
+	list_add(&netfs->link, &fscache_netfs_list);
+	ret = 0;
+
+	printk("FS-Cache: netfs '%s' registered for caching\n", netfs->name);
+
+already_registered:
+	up_write(&fscache_addremove_sem);
+
+	if (ret < 0) {
+		netfs->primary_index->parent = NULL;
+		__fscache_cookie_put(netfs->primary_index);
+		netfs->primary_index = NULL;
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_register_netfs);
+
+/*****************************************************************************/
+/*
+ * unregister a network filesystem from the cache
+ * - all cookies must have been released first
+ */
+void __fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+	_enter("{%s.%u}", netfs->name, netfs->version);
+
+	down_write(&fscache_addremove_sem);
+
+	list_del(&netfs->link);
+	fscache_relinquish_cookie(netfs->primary_index, 0);
+
+	up_write(&fscache_addremove_sem);
+
+	printk("FS-Cache: netfs '%s' unregistered from caching\n",
+	       netfs->name);
+
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_unregister_netfs);
+
+/*****************************************************************************/
+/*
+ * initialise a cache record
+ */
+void fscache_init_cache(struct fscache_cache *cache,
+			struct fscache_cache_ops *ops,
+			const char *idfmt,
+			...)
+{
+	va_list va;
+
+	memset(cache, 0, sizeof(*cache));
+
+	cache->ops = ops;
+
+	va_start(va, idfmt);
+	vsnprintf(cache->identifier, sizeof(cache->identifier), idfmt, va);
+	va_end(va);
+
+	INIT_LIST_HEAD(&cache->link);
+	INIT_LIST_HEAD(&cache->object_list);
+	spin_lock_init(&cache->object_list_lock);
+	init_rwsem(&cache->withdrawal_sem);
+}
+
+EXPORT_SYMBOL(fscache_init_cache);
+
+/*****************************************************************************/
+/*
+ * declare a mounted cache as being open for business
+ */
+int fscache_add_cache(struct fscache_cache *cache,
+		      struct fscache_object *ifsdef,
+		      const char *tagname)
+{
+	struct fscache_cache_tag *tag;
+
+	BUG_ON(!cache->ops);
+	BUG_ON(!ifsdef);
+
+	cache->flags = 0;
+
+	if (!tagname)
+		tagname = cache->identifier;
+
+	BUG_ON(!tagname[0]);
+
+	_enter("{%s.%s},,%s", cache->ops->name, cache->identifier, tagname);
+
+	if (!cache->ops->grab_object(ifsdef))
+		BUG();
+
+	ifsdef->cookie = &fscache_fsdef_index;
+	ifsdef->cache = cache;
+	cache->fsdef = ifsdef;
+
+	down_write(&fscache_addremove_sem);
+
+	/* instantiate or allocate a cache tag */
+	list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+		if (strcmp(tag->name, tagname) == 0) {
+			if (tag->cache) {
+				printk(KERN_ERR
+				       "FS-Cache: cache tag '%s' already in use\n",
+				       tagname);
+				up_write(&fscache_addremove_sem);
+				return -EEXIST;
+			}
+
+			atomic_inc(&tag->usage);
+			goto found_cache_tag;
+		}
+	}
+
+	tag = kmalloc(sizeof(*tag) + strlen(tagname) + 1, GFP_KERNEL);
+	if (!tag) {
+		up_write(&fscache_addremove_sem);
+		return -ENOMEM;
+	}
+
+	atomic_set(&tag->usage, 1);
+	strcpy(tag->name, tagname);
+	list_add_tail(&tag->link, &fscache_cache_tag_list);
+
+found_cache_tag:
+	tag->cache = cache;
+	cache->tag = tag;
+
+	/* add the cache to the list */
+	list_add(&cache->link, &fscache_cache_list);
+
+	/* add the cache's netfs definition index object to the cache's
+	 * list */
+	spin_lock(&cache->object_list_lock);
+	list_add_tail(&ifsdef->cache_link, &cache->object_list);
+	spin_unlock(&cache->object_list_lock);
+
+	/* add the cache's netfs definition index object to the top level index
+	 * cookie as a known backing object */
+	down_write(&fscache_fsdef_index.sem);
+
+	hlist_add_head(&ifsdef->cookie_link,
+		       &fscache_fsdef_index.backing_objects);
+
+	atomic_inc(&fscache_fsdef_index.usage);
+
+	/* done */
+	up_write(&fscache_fsdef_index.sem);
+	up_write(&fscache_addremove_sem);
+
+	printk(KERN_NOTICE
+	       "FS-Cache: Cache \"%s\" added (type %s)\n",
+	       cache->tag->name, cache->ops->name);
+
+	_leave(" = 0 [%s]", cache->identifier);
+	return 0;
+}
+
+EXPORT_SYMBOL(fscache_add_cache);
+
+/*****************************************************************************/
+/*
+ * note a cache I/O error
+ */
+void fscache_io_error(struct fscache_cache *cache)
+{
+	set_bit(FSCACHE_IOERROR, &cache->flags);
+
+	printk(KERN_ERR "FS-Cache: Cache %s stopped due to I/O error\n",
+	       cache->ops->name);
+}
+
+EXPORT_SYMBOL(fscache_io_error);
+
+/*****************************************************************************/
+/*
+ * withdraw an unmounted cache from the active service
+ */
+void fscache_withdraw_cache(struct fscache_cache *cache)
+{
+	struct fscache_object *object;
+
+	_enter("");
+
+	printk(KERN_NOTICE
+	       "FS-Cache: Withdrawing cache \"%s\"\n",
+	       cache->tag->name);
+
+	/* make the cache unavailable for cookie acquisition */
+	down_write(&cache->withdrawal_sem);
+
+	down_write(&fscache_addremove_sem);
+	list_del_init(&cache->link);
+	cache->tag->cache = NULL;
+	up_write(&fscache_addremove_sem);
+
+	/* mark all objects as being withdrawn */
+	spin_lock(&cache->object_list_lock);
+	list_for_each_entry(object, &cache->object_list, cache_link) {
+		set_bit(FSCACHE_OBJECT_WITHDRAWN, &object->flags);
+	}
+	spin_unlock(&cache->object_list_lock);
+
+	/* make sure all pages pinned by operations on behalf of the netfs are
+	 * written to disc */
+	cache->ops->sync_cache(cache);
+
+	/* dissociate all the netfs pages backed by this cache from the block
+	 * mappings in the cache */
+	cache->ops->dissociate_pages(cache);
+
+	/* we now have to destroy all the active objects pertaining to this
+	 * cache */
+	spin_lock(&cache->object_list_lock);
+
+	while (!list_empty(&cache->object_list)) {
+		object = list_entry(cache->object_list.next,
+				    struct fscache_object, cache_link);
+		list_del_init(&object->cache_link);
+		spin_unlock(&cache->object_list_lock);
+
+		_debug("withdraw %p", object->cookie);
+
+		/* we've extracted an active object from the tree - now dispose
+		 * of it */
+		fscache_withdraw_object(cache, object);
+
+		spin_lock(&cache->object_list_lock);
+	}
+
+	spin_unlock(&cache->object_list_lock);
+
+	fscache_release_cache_tag(cache->tag);
+	cache->tag = NULL;
+
+	_leave("");
+}
+
+EXPORT_SYMBOL(fscache_withdraw_cache);
+
+/*****************************************************************************/
+/*
+ * withdraw an object from active service at the behest of the cache
+ * - need break the links to a cached object cookie
+ * - called under two situations:
+ *   (1) recycler decides to reclaim an in-use object
+ *   (2) a cache is unmounted
+ * - have to take care as the cookie can be being relinquished by the netfs
+ *   simultaneously
+ * - the active object is pinned by the caller holding a refcount on it
+ */
+static void fscache_withdraw_object(struct fscache_cache *cache,
+				    struct fscache_object *object)
+{
+	struct fscache_cookie *cookie, *xcookie = NULL;
+
+	_enter(",%p", object);
+
+	/* first of all we have to break the links between the object and the
+	 * cookie
+	 * - we have to hold both semaphores BUT we have to get the cookie sem
+	 *   FIRST
+	 */
+	cache->ops->lock_object(object);
+
+	cookie = object->cookie;
+	if (cookie) {
+		/* pin the cookie so that is doesn't escape */
+		atomic_inc(&cookie->usage);
+
+		/* re-order the locks to avoid deadlock */
+		cache->ops->unlock_object(object);
+		down_write(&cookie->sem);
+		cache->ops->lock_object(object);
+
+		/* erase references from the object to the cookie */
+		hlist_del_init(&object->cookie_link);
+
+		xcookie = object->cookie;
+		object->cookie = NULL;
+
+		up_write(&cookie->sem);
+	}
+
+	cache->ops->unlock_object(object);
+
+	/* we've broken the links between cookie and object */
+	if (xcookie) {
+		fscache_cookie_put(xcookie);
+		cache->ops->put_object(object);
+	}
+
+	/* unpin the cookie */
+	if (cookie) {
+		if (cookie->def && cookie->def->now_uncached)
+			cookie->def->now_uncached(cookie->netfs_data);
+		fscache_cookie_put(cookie);
+	}
+
+	_leave("");
+}
+
+/*****************************************************************************/
+/*
+ * select a cache on which to store an object
+ * - the cache addremove semaphore must be at least read-locked by the caller
+ * - the object will never be an index
+ */
+static struct fscache_cache *fscache_select_cache_for_object(struct fscache_cookie *cookie)
+{
+	struct fscache_cache_tag *tag;
+	struct fscache_object *object;
+	struct fscache_cache *cache;
+
+	_enter("");
+
+	if (list_empty(&fscache_cache_list)) {
+		_leave(" = NULL [no cache]");
+		return NULL;
+	}
+
+	/* we check the parent to determine the cache to use */
+	down_read(&cookie->parent->sem);
+
+	/* the first in the parent's backing list should be the preferred
+	 * cache */
+	if (!hlist_empty(&cookie->parent->backing_objects)) {
+		object = hlist_entry(cookie->parent->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		cache = object->cache;
+		if (test_bit(FSCACHE_IOERROR, &cache->flags))
+			cache = NULL;
+
+		up_read(&cookie->parent->sem);
+		_leave(" = %p [parent]", cache);
+		return cache;
+	}
+
+	/* the parent is unbacked */
+	if (cookie->parent->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+		/* parent not an index and is unbacked */
+		up_read(&cookie->parent->sem);
+		_leave(" = NULL [parent ubni]");
+		return NULL;
+	}
+
+	up_read(&cookie->parent->sem);
+
+	if (!cookie->parent->def->select_cache)
+		goto no_preference;
+
+	/* ask the netfs for its preference */
+	tag = cookie->parent->def->select_cache(
+		cookie->parent->parent->netfs_data,
+		cookie->parent->netfs_data);
+
+	if (!tag)
+		goto no_preference;
+
+	if (tag == &fscache_nomem_tag) {
+		_leave(" = NULL [nomem tag]");
+		return NULL;
+	}
+
+	if (!tag->cache) {
+		_leave(" = NULL [unbacked tag]");
+		return NULL;
+	}
+
+	if (test_bit(FSCACHE_IOERROR, &tag->cache->flags))
+		return NULL;
+
+	_leave(" = %p [specific]", tag->cache);
+	return tag->cache;
+
+no_preference:
+	/* netfs has no preference - just select first cache */
+	cache = list_entry(fscache_cache_list.next,
+			   struct fscache_cache, link);
+	_leave(" = %p [first]", cache);
+	return cache;
+}
+
+/*****************************************************************************/
+/*
+ * get a backing object for a cookie from the chosen cache
+ * - the cookie must be write-locked by the caller
+ * - all parent indexes will be obtained recursively first
+ */
+static struct fscache_object *fscache_lookup_object(struct fscache_cookie *cookie,
+						    struct fscache_cache *cache)
+{
+	struct fscache_cookie *parent = cookie->parent;
+	struct fscache_object *pobject, *object;
+	struct hlist_node *_p;
+
+	_enter("{%s/%s},",
+	       parent && parent->def ? parent->def->name : "",
+	       cookie->def ? (char *) cookie->def->name : "<file>");
+
+	if (test_bit(FSCACHE_IOERROR, &cache->flags))
+		return NULL;
+
+	/* see if we have the backing object for this cookie + cache immediately
+	 * to hand
+	 */
+	object = NULL;
+	hlist_for_each_entry(object, _p,
+			     &cookie->backing_objects, cookie_link
+			     ) {
+		if (object->cache == cache)
+			break;
+	}
+
+	if (object) {
+		_leave(" = %p [old]", object);
+		return object;
+	}
+
+	BUG_ON(!parent); /* FSDEF entries don't have a parent */
+
+	/* we don't have a backing cookie, so we need to consult the object's
+	 * parent index in the selected cache and maybe insert an entry
+	 * therein; so the first thing to do is make sure that the parent index
+	 * is represented on disc
+	 */
+	down_read(&parent->sem);
+
+	pobject = NULL;
+	hlist_for_each_entry(pobject, _p,
+			     &parent->backing_objects, cookie_link
+			     ) {
+		if (pobject->cache == cache)
+			break;
+	}
+
+	if (!pobject) {
+		/* we don't know about the parent object */
+		up_read(&parent->sem);
+		down_write(&parent->sem);
+
+		pobject = fscache_lookup_object(parent, cache);
+		if (IS_ERR(pobject)) {
+			up_write(&parent->sem);
+			_leave(" = %ld [no ipobj]", PTR_ERR(pobject));
+			return pobject;
+		}
+
+		_debug("pobject=%p", pobject);
+
+		BUG_ON(pobject->cookie != parent);
+
+		downgrade_write(&parent->sem);
+	}
+
+	/* now we can attempt to look up this object in the parent, possibly
+	 * creating a representation on disc when we do so
+	 */
+	object = cache->ops->lookup_object(cache, pobject, cookie);
+	up_read(&parent->sem);
+
+	if (IS_ERR(object)) {
+		_leave(" = %ld [no obj]", PTR_ERR(object));
+		return object;
+	}
+
+	/* keep track of it */
+	cache->ops->lock_object(object);
+
+	BUG_ON(!hlist_unhashed(&object->cookie_link));
+
+	/* attach to the cache's object list */
+	if (list_empty(&object->cache_link)) {
+		spin_lock(&cache->object_list_lock);
+		list_add(&object->cache_link, &cache->object_list);
+		spin_unlock(&cache->object_list_lock);
+	}
+
+	/* attach to the cookie */
+	object->cookie = cookie;
+	atomic_inc(&cookie->usage);
+	hlist_add_head(&object->cookie_link, &cookie->backing_objects);
+
+	/* done */
+	cache->ops->unlock_object(object);
+	_leave(" = %p [new]", object);
+	return object;
+}
+
+/*****************************************************************************/
+/*
+ * request a cookie to represent an object (index, datafile, xattr, etc)
+ * - parent specifies the parent object
+ *   - the top level index cookie for each netfs is stored in the fscache_netfs
+ *     struct upon registration
+ * - idef points to the definition
+ * - the netfs_data will be passed to the functions pointed to in *def
+ * - all attached caches will be searched to see if they contain this object
+ * - index objects aren't stored on disk until there's a dependent file that
+ *   needs storing
+ * - other objects are stored in a selected cache immediately, and all the
+ *   indexes forming the path to it are instantiated if necessary
+ * - we never let on to the netfs about errors
+ *   - we may set a negative cookie pointer, but that's okay
+ */
+struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *parent,
+						struct fscache_cookie_def *def,
+						void *netfs_data)
+{
+	struct fscache_cookie *cookie;
+	struct fscache_cache *cache;
+	struct fscache_object *object;
+	int ret = 0;
+
+	BUG_ON(!def);
+
+	_enter("{%s},{%s},%p",
+	       parent ? (char *) parent->def->name : "<no-parent>",
+	       def->name, netfs_data);
+
+	/* if there's no parent cookie, then we don't create one here either */
+	if (!parent) {
+		_leave(" [no parent]");
+		return NULL;
+	}
+
+	/* validate the definition */
+	BUG_ON(!def->get_key);
+	BUG_ON(!def->name[0]);
+
+	BUG_ON(def->type == FSCACHE_COOKIE_TYPE_INDEX &&
+	       parent->def->type != FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* allocate and initialise a cookie */
+	cookie = kmem_cache_alloc(fscache_cookie_jar, SLAB_KERNEL);
+	if (!cookie) {
+		_leave(" [ENOMEM]");
+		return NULL;
+	}
+
+	atomic_set(&cookie->usage, 1);
+	atomic_set(&cookie->children, 0);
+
+	atomic_inc(&parent->usage);
+	atomic_inc(&parent->children);
+
+	cookie->def		= def;
+	cookie->parent		= parent;
+	cookie->netfs		= parent->netfs;
+	cookie->netfs_data	= netfs_data;
+
+	/* now we need to see whether the backing objects for this cookie yet
+	 * exist, if not there'll be nothing to search */
+	down_read(&fscache_addremove_sem);
+
+	if (list_empty(&fscache_cache_list)) {
+		up_read(&fscache_addremove_sem);
+		_leave(" = %p [no caches]", cookie);
+		return cookie;
+	}
+
+	/* if the object is an index then we need do nothing more here - we
+	 * create indexes on disk when we need them as an index may exist in
+	 * multiple caches */
+	if (cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+		down_write(&cookie->sem);
+
+		/* the object is a file - we need to select a cache in which to
+		 * store it */
+		cache = fscache_select_cache_for_object(cookie);
+		if (!cache)
+			goto no_cache; /* couldn't decide on a cache */
+
+		/* create a file index entry on disc, along with all the
+		 * indexes required to find it again later */
+		object = fscache_lookup_object(cookie, cache);
+		if (IS_ERR(object)) {
+			ret = PTR_ERR(object);
+			goto error;
+		}
+
+		up_write(&cookie->sem);
+	}
+out:
+	up_read(&fscache_addremove_sem);
+	_leave(" = %p", cookie);
+	return cookie;
+
+no_cache:
+	ret = -ENOMEDIUM;
+	goto error_cleanup;
+error:
+	printk(KERN_ERR "FS-Cache: error from cache: %d\n", ret);
+error_cleanup:
+	if (cookie) {
+		up_write(&cookie->sem);
+		__fscache_cookie_put(cookie);
+		cookie = NULL;
+		atomic_dec(&parent->children);
+	}
+
+	goto out;
+}
+
+EXPORT_SYMBOL(__fscache_acquire_cookie);
+
+/*****************************************************************************/
+/*
+ * release a cookie back to the cache
+ * - the object will be marked as recyclable on disc if retire is true
+ * - all dependents of this cookie must have already been unregistered
+ *   (indexes/files/pages)
+ */
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
+{
+	struct fscache_cache *cache;
+	struct fscache_object *object;
+	struct hlist_node *_p;
+
+	if (!cookie) {
+		_leave(" [no cookie]");
+		return;
+	}
+
+	_enter("%p{%s},%d", cookie, cookie->def->name, retire);
+
+	if (atomic_read(&cookie->children) != 0) {
+		printk("FS-Cache: cookie still has children\n");
+		BUG();
+	}
+
+	/* detach pointers back to the netfs */
+	down_write(&cookie->sem);
+
+	cookie->netfs_data	= NULL;
+	cookie->def		= NULL;
+
+	/* mark retired objects for recycling */
+	if (retire) {
+		hlist_for_each_entry(object, _p,
+				     &cookie->backing_objects,
+				     cookie_link
+				     ) {
+			set_bit(FSCACHE_OBJECT_RECYCLING, &object->flags);
+		}
+	}
+
+	/* break links with all the active objects */
+	while (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object,
+				     cookie_link);
+
+		/* detach each cache object from the object cookie */
+		set_bit(FSCACHE_OBJECT_RELEASING, &object->flags);
+
+		hlist_del_init(&object->cookie_link);
+
+		cache = object->cache;
+		cache->ops->lock_object(object);
+		object->cookie = NULL;
+		cache->ops->unlock_object(object);
+
+		if (atomic_dec_and_test(&cookie->usage))
+			/* the cookie refcount shouldn't be reduced to 0 yet */
+			BUG();
+
+		spin_lock(&cache->object_list_lock);
+		list_del_init(&object->cache_link);
+		spin_unlock(&cache->object_list_lock);
+
+		cache->ops->put_object(object);
+	}
+
+	up_write(&cookie->sem);
+
+	if (cookie->parent) {
+#ifdef CONFIG_DEBUG_SLAB
+		BUG_ON((atomic_read(&cookie->parent->children) & 0xffff0000) == 0x6b6b0000);
+#endif
+		atomic_dec(&cookie->parent->children);
+	}
+
+	/* finally dispose of the cookie */
+	fscache_cookie_put(cookie);
+
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_relinquish_cookie);
+
+/*****************************************************************************/
+/*
+ * update the index entries backing a cookie
+ */
+void __fscache_update_cookie(struct fscache_cookie *cookie)
+{
+	struct fscache_object *object;
+	struct hlist_node *_p;
+
+	if (!cookie) {
+		_leave(" [no cookie]");
+		return;
+	}
+
+	_enter("{%s}", cookie->def->name);
+
+	BUG_ON(!cookie->def->get_aux);
+
+	down_write(&cookie->sem);
+	down_read(&cookie->parent->sem);
+
+	/* update the index entry on disc in each cache backing this cookie */
+	hlist_for_each_entry(object, _p,
+			     &cookie->backing_objects, cookie_link
+			     ) {
+		if (!test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			object->cache->ops->update_object(object);
+	}
+
+	up_read(&cookie->parent->sem);
+	up_write(&cookie->sem);
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_update_cookie);
+
+/*****************************************************************************/
+/*
+ * destroy a cookie
+ */
+static void __fscache_cookie_put(struct fscache_cookie *cookie)
+{
+	struct fscache_cookie *parent;
+
+	_enter("%p", cookie);
+
+	for (;;) {
+		parent = cookie->parent;
+		BUG_ON(!hlist_empty(&cookie->backing_objects));
+		kmem_cache_free(fscache_cookie_jar, cookie);
+
+		if (!parent)
+			break;
+
+		cookie = parent;
+		BUG_ON(atomic_read(&cookie->usage) <= 0);
+		if (!atomic_dec_and_test(&cookie->usage))
+			break;
+	}
+
+	_leave("");
+}
+
+/*****************************************************************************/
+/*
+ * initialise an cookie jar slab element prior to any use
+ */
+void fscache_cookie_init_once(void *_cookie, kmem_cache_t *cachep,
+			      unsigned long flags)
+{
+	struct fscache_cookie *cookie = _cookie;
+
+	if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
+	    SLAB_CTOR_CONSTRUCTOR) {
+		memset(cookie, 0, sizeof(*cookie));
+		init_rwsem(&cookie->sem);
+		INIT_HLIST_HEAD(&cookie->backing_objects);
+	}
+}
+
+/*****************************************************************************/
+/*
+ * pin an object into the cache
+ */
+int __fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p", cookie);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" = -ENOBUFS");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it and exclude
+	 * read and write attempts on pages
+	 */
+	down_write(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		if (!object->cache->ops->pin_object) {
+			ret = -EOPNOTSUPP;
+			goto out;
+		}
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->pin_object(object);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_write(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_pin_cookie);
+
+/*****************************************************************************/
+/*
+ * unpin an object into the cache
+ */
+void __fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p", cookie);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" [no obj]");
+		return;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it and exclude
+	 * read and write attempts on pages
+	 */
+	down_write(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and unpin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		if (!object->cache->ops->unpin_object)
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				object->cache->ops->unpin_object(object);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_write(&cookie->sem);
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_unpin_cookie);
diff --git a/fs/fscache/fscache-int.h b/fs/fscache/fscache-int.h
new file mode 100644
index 0000000..d075660
--- /dev/null
+++ b/fs/fscache/fscache-int.h
@@ -0,0 +1,93 @@
+/* fscache-int.h: internal definitions
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _FSCACHE_INT_H
+#define _FSCACHE_INT_H
+
+#include <linux/fscache-cache.h>
+#include <linux/timer.h>
+#include <linux/bio.h>
+
+extern kmem_cache_t *fscache_cookie_jar;
+
+extern struct fscache_cookie fscache_fsdef_index;
+extern struct fscache_cookie_def fscache_fsdef_netfs_def;
+
+extern void fscache_cookie_init_once(void *_cookie, kmem_cache_t *cachep, unsigned long flags);
+
+/*
+ * prevent the cache from being withdrawn whilst an operation is in progress
+ * - returns false if the cache is being withdrawn already or if the cache is
+ *   waiting to withdraw itself
+ * - returns true if the cache was not being withdrawn
+ * - fscache_withdraw_cache() will wait using down_write() until all ops are
+ *   complete
+ */
+static inline int fscache_operation_lock(struct fscache_object *object)
+{
+	return down_read_trylock(&object->cache->withdrawal_sem);
+}
+
+/*
+ * release the operation lock
+ */
+static inline void fscache_operation_unlock(struct fscache_object *object)
+{
+	up_read(&object->cache->withdrawal_sem);
+}
+
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT,...) \
+	printk("[%-6.6s] "FMT"\n",current->comm ,##__VA_ARGS__)
+#define _dbprintk(FMT,...) do { } while(0)
+
+#define kenter(FMT,...)	dbgprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define kleave(FMT,...)	dbgprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define kdebug(FMT,...)	dbgprintk(FMT ,##__VA_ARGS__)
+
+#define kjournal(FMT,...) _dbprintk(FMT ,##__VA_ARGS__)
+
+#define dbgfree(ADDR)  _dbprintk("%p:%d: FREEING %p",__FILE__,__LINE__,ADDR)
+
+#define dbgpgalloc(PAGE)						\
+do {									\
+	_dbprintk("PGALLOC %s:%d: %p {%lx,%lu}\n",			\
+		  __FILE__,__LINE__,					\
+		  (PAGE),(PAGE)->mapping->host->i_ino,(PAGE)->index	\
+		  );							\
+} while(0)
+
+#define dbgpgfree(PAGE)						\
+do {								\
+	if ((PAGE))						\
+		_dbprintk("PGFREE %s:%d: %p {%lx,%lu}\n",	\
+			  __FILE__,__LINE__,			\
+			  (PAGE),				\
+			  (PAGE)->mapping->host->i_ino,		\
+			  (PAGE)->index				\
+			  );					\
+} while(0)
+
+#ifdef __KDEBUG
+#define _enter(FMT,...)	kenter(FMT,##__VA_ARGS__)
+#define _leave(FMT,...)	kleave(FMT,##__VA_ARGS__)
+#define _debug(FMT,...)	kdebug(FMT,##__VA_ARGS__)
+#else
+#define _enter(FMT,...)	do { } while(0)
+#define _leave(FMT,...)	do { } while(0)
+#define _debug(FMT,...)	do { } while(0)
+#endif
+
+#endif /* _FSCACHE_INT_H */
diff --git a/fs/fscache/fsdef.c b/fs/fscache/fsdef.c
new file mode 100644
index 0000000..495f633
--- /dev/null
+++ b/fs/fscache/fsdef.c
@@ -0,0 +1,110 @@
+/* fsdef.c: filesystem index definition
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include "fscache-int.h"
+
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax);
+
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax);
+
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+							const void *data,
+							uint16_t datalen);
+
+struct fscache_cookie_def fscache_fsdef_netfs_def = {
+	.name		= "FSDEF.netfs",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= fscache_fsdef_netfs_get_key,
+	.get_aux	= fscache_fsdef_netfs_get_aux,
+	.check_aux	= fscache_fsdef_netfs_check_aux,
+};
+
+struct fscache_cookie fscache_fsdef_index = {
+	.usage		= ATOMIC_INIT(1),
+	.def		= NULL,
+	.sem		= __RWSEM_INITIALIZER(fscache_fsdef_index.sem),
+	.backing_objects = HLIST_HEAD_INIT,
+};
+
+EXPORT_SYMBOL(fscache_fsdef_index);
+
+/*****************************************************************************/
+/*
+ * get the key data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
+{
+	const struct fscache_netfs *netfs = cookie_netfs_data;
+	unsigned klen;
+
+	_enter("{%s.%u},", netfs->name, netfs->version);
+
+	klen = strlen(netfs->name);
+	if (klen > bufmax)
+		return 0;
+
+	memcpy(buffer, netfs->name, klen);
+	return klen;
+}
+
+/*****************************************************************************/
+/*
+ * get the auxilliary data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
+{
+	const struct fscache_netfs *netfs = cookie_netfs_data;
+	unsigned dlen;
+
+	_enter("{%s.%u},", netfs->name, netfs->version);
+
+	dlen = sizeof(uint32_t);
+	if (dlen > bufmax)
+		return 0;
+
+	memcpy(buffer, &netfs->version, dlen);
+	return dlen;
+}
+
+/*****************************************************************************/
+/*
+ * check that the version stored in the auxilliary data is correct
+ */
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+							const void *data,
+							uint16_t datalen)
+{
+	struct fscache_netfs *netfs = cookie_netfs_data;
+	uint32_t version;
+
+	_enter("{%s},,%hu", netfs->name, datalen);
+
+	if (datalen != sizeof(version)) {
+		_leave(" = OBSOLETE [dl=%d v=%d]",
+		       datalen, sizeof(version));
+		return FSCACHE_CHECKAUX_OBSOLETE;
+	}
+
+	memcpy(&version, data, sizeof(version));
+	if (version != netfs->version) {
+		_leave(" = OBSOLETE [ver=%x net=%x]",
+		       version, netfs->version);
+		return FSCACHE_CHECKAUX_OBSOLETE;
+	}
+
+	_leave(" = OKAY");
+	return FSCACHE_CHECKAUX_OKAY;
+}
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
new file mode 100644
index 0000000..d73068c
--- /dev/null
+++ b/fs/fscache/main.c
@@ -0,0 +1,105 @@
+/* main.c: general filesystem caching manager
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include "fscache-int.h"
+
+int fscache_debug;
+
+static int fscache_init(void);
+static void fscache_exit(void);
+
+fs_initcall(fscache_init);
+module_exit(fscache_exit);
+
+MODULE_DESCRIPTION("FS Cache Manager");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+static void fscache_ktype_release(struct kobject *kobject);
+
+static struct sysfs_ops fscache_sysfs_ops = {
+	.show		= NULL,
+	.store		= NULL,
+};
+
+static struct kobj_type fscache_ktype = {
+	.release	= fscache_ktype_release,
+	.sysfs_ops	= &fscache_sysfs_ops,
+	.default_attrs	= NULL,
+};
+
+struct kset fscache_kset = {
+	.kobj.name	= "fscache",
+	.kobj.kset	= &fs_subsys.kset,
+	.ktype		= &fscache_ktype,
+};
+
+EXPORT_SYMBOL(fscache_kset);
+
+/*****************************************************************************/
+/*
+ * initialise the fs caching module
+ */
+static int __init fscache_init(void)
+{
+	int ret;
+
+	fscache_cookie_jar =
+		kmem_cache_create("fscache_cookie_jar",
+				  sizeof(struct fscache_cookie),
+				  0,
+				  0,
+				  fscache_cookie_init_once,
+				  NULL);
+
+	if (!fscache_cookie_jar) {
+		printk(KERN_NOTICE
+		       "FS-Cache: Failed to allocate a cookie jar\n");
+		return -ENOMEM;
+	}
+
+	ret = kset_register(&fscache_kset);
+	if (ret < 0) {
+		kmem_cache_destroy(fscache_cookie_jar);
+		return ret;
+	}
+
+	printk(KERN_NOTICE "FS-Cache: Loaded\n");
+	return 0;
+
+}
+
+/*****************************************************************************/
+/*
+ * clean up on module removal
+ */
+static void __exit fscache_exit(void)
+{
+	_enter("");
+
+	kset_unregister(&fscache_kset);
+	kmem_cache_destroy(fscache_cookie_jar);
+	printk(KERN_NOTICE "FS-Cache: unloaded\n");
+
+}
+
+/*****************************************************************************/
+/*
+ * release the ktype
+ */
+static void fscache_ktype_release(struct kobject *kobject)
+{
+}
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
new file mode 100644
index 0000000..fbb7716
--- /dev/null
+++ b/fs/fscache/page.c
@@ -0,0 +1,537 @@
+/* page.c: general filesystem cache cookie management
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include "fscache-int.h"
+
+/*****************************************************************************/
+/*
+ * set the data file size on an object in the cache
+ */
+int __fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,%llu,", cookie, i_size);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" = -ENOBUFS");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it and exclude
+	 * read and write attempts on pages
+	 */
+	down_write(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (object->cache->ops->set_i_size &&
+		    fscache_operation_lock(object)
+		    ) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->set_i_size(object,
+								     i_size);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_write(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_set_i_size);
+
+/*****************************************************************************/
+/*
+ * reserve space for an object
+ */
+int __fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,%llu,", cookie, size);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" = -ENOBUFS");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it and exclude
+	 * read and write attempts on pages
+	 */
+	down_write(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		if (!object->cache->ops->reserve_space) {
+			ret = -EOPNOTSUPP;
+			goto out;
+		}
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->reserve_space(object,
+									size);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_write(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_reserve_space);
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - we return:
+ *   -ENOMEM	- out of memory, nothing done
+ *   -EINTR	- interrupted
+ *   -ENOBUFS	- no backing object available in which to cache the block
+ *   -ENODATA	- no data available in the backing object for this block
+ *   0		- dispatched a read - it'll call end_io_func() when finished
+ */
+int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+				 struct page *page,
+				 fscache_rw_complete_t end_io_func,
+				 void *context,
+				 gfp_t gfp)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,{%lu},", cookie, page->index);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" -ENOBUFS [no backing objects]");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it */
+	down_read(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->read_or_alloc_page(
+					object,
+					page,
+					end_io_func,
+					context,
+					gfp);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_read(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_page);
+
+/*****************************************************************************/
+/*
+ * read a list of page from the cache or allocate a block in which to store
+ * them
+ * - we return:
+ *   -ENOMEM	- out of memory, some pages may be being read
+ *   -EINTR	- interrupted, some pages may be being read
+ *   -ENOBUFS	- no backing object or space available in which to cache any
+ *                pages not being read
+ *   -ENODATA	- no data available in the backing object for some or all of
+ *                the pages
+ *   0		- dispatched a read on all pages
+ *
+ * end_io_func() will be called for each page read from the cache as it is
+ * finishes being read
+ *
+ * any pages for which a read is dispatched will be removed from pages and
+ * nr_pages
+ */
+int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+				  struct address_space *mapping,
+				  struct list_head *pages,
+				  unsigned *nr_pages,
+				  fscache_rw_complete_t end_io_func,
+				  void *context,
+				  gfp_t gfp)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,,%d,,,", cookie, *nr_pages);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" -ENOBUFS [no backing objects]");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+	BUG_ON(list_empty(pages));
+	BUG_ON(*nr_pages <= 0);
+
+	/* prevent the file from being uncached whilst we access it */
+	down_read(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->read_or_alloc_pages(
+					object,
+					mapping,
+					pages,
+					nr_pages,
+					end_io_func,
+					context,
+					gfp);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_read(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_pages);
+
+/*****************************************************************************/
+/*
+ * allocate a block in the cache on which to store a page
+ * - we return:
+ *   -ENOMEM	- out of memory, nothing done
+ *   -EINTR	- interrupted
+ *   -ENOBUFS	- no backing object available in which to cache the block
+ *   0		- block allocated
+ */
+int __fscache_alloc_page(struct fscache_cookie *cookie,
+			 struct page *page,
+			 gfp_t gfp)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,{%lu},", cookie, page->index);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" -ENOBUFS [no backing objects]");
+		return -ENOBUFS;
+	}
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from being uncached whilst we access it */
+	down_read(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		/* get and pin the backing object */
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			if (object->cache->ops->grab_object(object)) {
+				/* ask the cache to honour the operation */
+				ret = object->cache->ops->allocate_page(object,
+									page,
+									gfp);
+
+				object->cache->ops->put_object(object);
+			}
+
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_read(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_alloc_page);
+
+/*****************************************************************************/
+/*
+ * request a page be stored in the cache
+ * - returns:
+ *   -ENOMEM	- out of memory, nothing done
+ *   -EINTR	- interrupted
+ *   -ENOBUFS	- no backing object available in which to cache the page
+ *   0		- dispatched a write - it'll call end_io_func() when finished
+ */
+int __fscache_write_page(struct fscache_cookie *cookie,
+			 struct page *page,
+			 fscache_rw_complete_t end_io_func,
+			 void *context,
+			 gfp_t gfp)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,{%lu},", cookie, page->index);
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from been uncached whilst we deal with it */
+	down_read(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			/* ask the cache to honour the operation */
+			ret = object->cache->ops->write_page(object,
+							     page,
+							     end_io_func,
+							     context,
+							     gfp);
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_read(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_write_page);
+
+/*****************************************************************************/
+/*
+ * request several pages be stored in the cache
+ * - returns:
+ *   -ENOMEM	- out of memory, nothing done
+ *   -EINTR	- interrupted
+ *   -ENOBUFS	- no backing object available in which to cache the page
+ *   0		- dispatched a write - it'll call end_io_func() when finished
+ */
+int __fscache_write_pages(struct fscache_cookie *cookie,
+			  struct pagevec *pagevec,
+			  fscache_rw_complete_t end_io_func,
+			  void *context,
+			  gfp_t gfp)
+{
+	struct fscache_object *object;
+	int ret;
+
+	_enter("%p,{%ld},", cookie, pagevec->nr);
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	/* prevent the file from been uncached whilst we deal with it */
+	down_read(&cookie->sem);
+
+	ret = -ENOBUFS;
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+			goto out;
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			/* ask the cache to honour the operation */
+			ret = object->cache->ops->write_pages(object,
+							      pagevec,
+							      end_io_func,
+							      context,
+							      gfp);
+			fscache_operation_unlock(object);
+		}
+	}
+
+out:
+	up_read(&cookie->sem);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+EXPORT_SYMBOL(__fscache_write_pages);
+
+/*****************************************************************************/
+/*
+ * remove a page from the cache
+ */
+void __fscache_uncache_page(struct fscache_cookie *cookie, struct page *page)
+{
+	struct fscache_object *object;
+	struct pagevec pagevec;
+
+	_enter(",{%lu}", page->index);
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" [no backing]");
+		return;
+	}
+
+	pagevec_init(&pagevec, 0);
+	pagevec_add(&pagevec, page);
+
+	/* ask the cache to honour the operation */
+	down_read(&cookie->sem);
+
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			object->cache->ops->uncache_pages(object, &pagevec);
+			fscache_operation_unlock(object);
+		}
+	}
+
+	up_read(&cookie->sem);
+
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_uncache_page);
+
+/*****************************************************************************/
+/*
+ * remove a bunch of pages from the cache
+ */
+void __fscache_uncache_pages(struct fscache_cookie *cookie,
+			     struct pagevec *pagevec)
+{
+	struct fscache_object *object;
+
+	_enter(",{%ld}", pagevec->nr);
+
+	BUG_ON(pagevec->nr <= 0);
+	BUG_ON(!pagevec->pages[0]);
+
+	/* not supposed to use this for indexes */
+	BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+	if (hlist_empty(&cookie->backing_objects)) {
+		_leave(" [no backing]");
+		return;
+	}
+
+	/* ask the cache to honour the operation */
+	down_read(&cookie->sem);
+
+	if (!hlist_empty(&cookie->backing_objects)) {
+		object = hlist_entry(cookie->backing_objects.first,
+				     struct fscache_object, cookie_link);
+
+		/* prevent the cache from being withdrawn */
+		if (fscache_operation_lock(object)) {
+			object->cache->ops->uncache_pages(object, pagevec);
+			fscache_operation_unlock(object);
+		}
+	}
+
+	up_read(&cookie->sem);
+
+	_leave("");
+}
+
+EXPORT_SYMBOL(__fscache_uncache_pages);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
new file mode 100644
index 0000000..e229fed
--- /dev/null
+++ b/include/linux/fscache-cache.h
@@ -0,0 +1,243 @@
+/* fscache-cache.h: general filesystem caching backing cache interface
+ *
+ * Copyright (C) 2004-6 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE!!! See:
+ *
+ *	Documentation/filesystems/caching/backend-api.txt
+ *
+ * for a description of the cache backend interface declared here.
+ */
+
+#ifndef _LINUX_FSCACHE_CACHE_H
+#define _LINUX_FSCACHE_CACHE_H
+
+#include <linux/fscache.h>
+
+#define NR_MAXCACHES BITS_PER_LONG
+
+struct fscache_cache;
+struct fscache_cache_ops;
+struct fscache_object;
+
+/*
+ * cache tag definition
+ */
+struct fscache_cache_tag {
+	struct list_head		link;
+	struct fscache_cache		*cache;		/* cache referred to by this tag */
+	atomic_t			usage;
+	char				name[0];	/* tag name */
+};
+
+/*
+ * cache definition
+ */
+struct fscache_cache {
+	struct fscache_cache_ops	*ops;
+	struct fscache_cache_tag	*tag;		/* tag representing this cache */
+	struct list_head		link;		/* link in list of caches */
+	struct rw_semaphore		withdrawal_sem;	/* withdrawal control sem */
+	size_t				max_index_size;	/* maximum size of index data */
+	char				identifier[32];	/* cache label */
+
+	/* node management */
+	struct list_head		object_list;	/* list of data/index objects */
+	spinlock_t			object_list_lock;
+	struct fscache_object		*fsdef;		/* object for the fsdef index */
+	unsigned long			flags;
+#define FSCACHE_IOERROR			0	/* cache stopped on I/O error */
+};
+
+extern void fscache_init_cache(struct fscache_cache *cache,
+			       struct fscache_cache_ops *ops,
+			       const char *idfmt,
+			       ...) __attribute__ ((format (printf,3,4)));
+
+extern int fscache_add_cache(struct fscache_cache *cache,
+			     struct fscache_object *fsdef,
+			     const char *tagname);
+extern void fscache_withdraw_cache(struct fscache_cache *cache);
+
+extern void fscache_io_error(struct fscache_cache *cache);
+
+/*****************************************************************************/
+/*
+ * cache operations
+ */
+struct fscache_cache_ops {
+	/* name of cache provider */
+	const char *name;
+
+	/* look up the object for a cookie, creating it on disc if necessary */
+	struct fscache_object *(*lookup_object)(struct fscache_cache *cache,
+						struct fscache_object *parent,
+						struct fscache_cookie *cookie);
+
+	/* increment the usage count on this object (may fail if unmounting) */
+	struct fscache_object *(*grab_object)(struct fscache_object *object);
+
+	/* lock a semaphore on an object */
+	void (*lock_object)(struct fscache_object *object);
+
+	/* unlock a semaphore on an object */
+	void (*unlock_object)(struct fscache_object *object);
+
+	/* pin an object in the cache */
+	int (*pin_object)(struct fscache_object *object);
+
+	/* unpin an object in the cache */
+	void (*unpin_object)(struct fscache_object *object);
+
+	/* store the updated auxilliary data on an object */
+	void (*update_object)(struct fscache_object *object);
+
+	/* dispose of a reference to an object */
+	void (*put_object)(struct fscache_object *object);
+
+	/* sync a cache */
+	void (*sync_cache)(struct fscache_cache *cache);
+
+	/* set the data size of an object */
+	int (*set_i_size)(struct fscache_object *object, loff_t i_size);
+
+	/* reserve space for an object's data and associated metadata */
+	int (*reserve_space)(struct fscache_object *object, loff_t i_size);
+
+	/* request a backing block for a page be read or allocated in the
+	 * cache */
+	int (*read_or_alloc_page)(struct fscache_object *object,
+				  struct page *page,
+				  fscache_rw_complete_t end_io_func,
+				  void *context,
+				  unsigned long gfp);
+
+	/* request backing blocks for a list of pages be read or allocated in
+	 * the cache */
+	int (*read_or_alloc_pages)(struct fscache_object *object,
+				   struct address_space *mapping,
+				   struct list_head *pages,
+				   unsigned *nr_pages,
+				   fscache_rw_complete_t end_io_func,
+				   void *context,
+				   unsigned long gfp);
+
+	/* request a backing block for a page be allocated in the cache so that
+	 * it can be written directly */
+	int (*allocate_page)(struct fscache_object *object,
+			     struct page *page,
+			     unsigned long gfp);
+
+	/* write a page to its backing block in the cache */
+	int (*write_page)(struct fscache_object *object,
+			  struct page *page,
+			  fscache_rw_complete_t end_io_func,
+			  void *context,
+			  unsigned long gfp);
+
+	/* write several pages to their backing blocks in the cache */
+	int (*write_pages)(struct fscache_object *object,
+			   struct pagevec *pagevec,
+			   fscache_rw_complete_t end_io_func,
+			   void *context,
+			   unsigned long gfp);
+
+	/* detach backing block from a bunch of pages */
+	void (*uncache_pages)(struct fscache_object *object,
+			     struct pagevec *pagevec);
+
+	/* dissociate a cache from all the pages it was backing */
+	void (*dissociate_pages)(struct fscache_cache *cache);
+};
+
+/*****************************************************************************/
+/*
+ * data file or index object cookie
+ * - a file will only appear in one cache
+ * - a request to cache a file may or may not be honoured, subject to
+ *   constraints such as disc space
+ * - indexes files are created on disc just-in-time
+ */
+struct fscache_cookie {
+	atomic_t			usage;		/* number of users of this cookie */
+	atomic_t			children;	/* number of children of this cookie */
+	struct rw_semaphore		sem;		/* list creation vs scan lock */
+	struct hlist_head		backing_objects; /* object(s) backing this file/index */
+	struct fscache_cookie_def	*def;		/* definition */
+	struct fscache_cookie		*parent;	/* parent of this entry */
+	struct fscache_netfs		*netfs;		/* owner network fs definition */
+	void				*netfs_data;	/* back pointer to netfs */
+};
+
+extern struct fscache_cookie fscache_fsdef_index;
+
+/*****************************************************************************/
+/*
+ * on-disc cache file or index handle
+ */
+struct fscache_object {
+	unsigned long			flags;
+#define FSCACHE_OBJECT_RELEASING	0	/* T if object is being released */
+#define FSCACHE_OBJECT_RECYCLING	1	/* T if object is being retired */
+#define FSCACHE_OBJECT_WITHDRAWN	2	/* T if object has been withdrawn */
+
+	struct list_head		cache_link;	/* link in cache->object_list */
+	struct hlist_node		cookie_link;	/* link in cookie->backing_objects */
+	struct fscache_cache		*cache;		/* cache that supplied this object */
+	struct fscache_cookie		*cookie;	/* netfs's file/index object */
+};
+
+static inline
+void fscache_object_init(struct fscache_object *object)
+{
+	object->flags = 0;
+	INIT_LIST_HEAD(&object->cache_link);
+	INIT_HLIST_NODE(&object->cookie_link);
+	object->cache = NULL;
+	object->cookie = NULL;
+}
+
+/* find the parent index object for a object */
+static inline
+struct fscache_object *fscache_find_parent_object(struct fscache_object *object)
+{
+	struct fscache_object *parent;
+	struct fscache_cookie *cookie = object->cookie;
+	struct fscache_cache *cache = object->cache;
+	struct hlist_node *_p;
+
+	hlist_for_each_entry(parent, _p,
+			     &cookie->parent->backing_objects,
+			     cookie_link
+			     ) {
+		if (parent->cache == cache)
+			return parent;
+	}
+
+	return NULL;
+}
+
+/* get an extra reference to a context */
+static inline
+void *fscache_get_context(struct fscache_cookie *cookie, void *context)
+{
+	if (cookie->def->get_context)
+		cookie->def->get_context(cookie->netfs_data, context);
+	return context;
+}
+
+/* release an extra reference to a context */
+static inline
+void fscache_put_context(struct fscache_cookie *cookie, void *context)
+{
+	if (cookie->def->put_context)
+		cookie->def->put_context(cookie->netfs_data, context);
+}
+
+#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
new file mode 100644
index 0000000..b9697f9
--- /dev/null
+++ b/include/linux/fscache.h
@@ -0,0 +1,496 @@
+/* fscache.h: general filesystem caching interface
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE!!! See:
+ *
+ *	Documentation/filesystems/caching/netfs-api.txt
+ *
+ * for a description of the network filesystem interface declared here.
+ */
+
+#ifndef _LINUX_FSCACHE_H
+#define _LINUX_FSCACHE_H
+
+#include <linux/config.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+
+struct pagevec;
+struct fscache_cache_tag;
+struct fscache_cookie;
+struct fscache_netfs;
+struct fscache_netfs_operations;
+
+typedef void (*fscache_rw_complete_t)(struct page *page,
+				      void *context,
+				      int error);
+
+/* result of index entry consultation */
+typedef enum {
+	FSCACHE_CHECKAUX_OKAY,		/* entry okay as is */
+	FSCACHE_CHECKAUX_NEEDS_UPDATE,	/* entry requires update */
+	FSCACHE_CHECKAUX_OBSOLETE,	/* entry requires deletion */
+} fscache_checkaux_t;
+
+/*****************************************************************************/
+/*
+ * fscache cookie definition
+ */
+struct fscache_cookie_def
+{
+	/* name of cookie type */
+	char name[16];
+
+	/* cookie type */
+	uint8_t type;
+#define FSCACHE_COOKIE_TYPE_INDEX	0
+#define FSCACHE_COOKIE_TYPE_DATAFILE	1
+
+	/* select the cache into which to insert an entry in this index
+	 * - optional
+	 * - should return a cache identifier or NULL to cause the cache to be
+	 *   inherited from the parent if possible or the first cache picked
+	 *   for a non-index file if not
+	 */
+	struct fscache_cache_tag *(*select_cache)(const void *parent_netfs_data,
+						  const void *cookie_netfs_data);
+
+	/* get an index key
+	 * - should store the key data in the buffer
+	 * - should return the amount of amount stored
+	 * - not permitted to return an error
+	 * - the netfs data from the cookie being used as the source is
+	 *   presented
+	 */
+	uint16_t (*get_key)(const void *cookie_netfs_data,
+			    void *buffer,
+			    uint16_t bufmax);
+
+	/* get certain file attributes from the netfs data
+	 * - this function can be absent for an index
+	 * - not permitted to return an error
+	 * - the netfs data from the cookie being used as the source is
+	 *   presented
+	 */
+	void (*get_attr)(const void *cookie_netfs_data, uint64_t *size);
+
+	/* get the auxilliary data from netfs data
+	 * - this function can be absent if the index carries no state data
+	 * - should store the auxilliary data in the buffer
+	 * - should return the amount of amount stored
+	 * - not permitted to return an error
+	 * - the netfs data from the cookie being used as the source is
+	 *   presented
+	 */
+	uint16_t (*get_aux)(const void *cookie_netfs_data,
+			    void *buffer,
+			    uint16_t bufmax);
+
+	/* consult the netfs about the state of an object
+	 * - this function can be absent if the index carries no state data
+	 * - the netfs data from the cookie being used as the target is
+	 *   presented, as is the auxilliary data
+	 */
+	fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+					const void *data,
+					uint16_t datalen);
+
+	/* get an extra reference on a read context
+	 * - this function can be absent if the completion function doesn't
+	 *   require a context
+	 */
+	void (*get_context)(void *cookie_netfs_data, void *context);
+
+	/* release an extra reference on a read context
+	 * - this function can be absent if the completion function doesn't
+	 *   require a context
+	 */
+	void (*put_context)(void *cookie_netfs_data, void *context);
+
+	/* indicate pages that now have cache metadata retained
+	 * - this function should mark the specified pages as now being cached
+	 */
+	void (*mark_pages_cached)(void *cookie_netfs_data,
+				  struct address_space *mapping,
+				  struct pagevec *cached_pvec);
+
+	/* indicate the cookie is no longer cached
+	 * - this function is called when the backing store currently caching
+	 *   a cookie is removed
+	 * - the netfs should use this to clean up any markers indicating
+	 *   cached pages
+	 * - this is mandatory for any object that may have data
+	 */
+	void (*now_uncached)(void *cookie_netfs_data);
+};
+
+/* pattern used to fill dead space in an index entry */
+#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
+
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *parent,
+						       struct fscache_cookie_def *def,
+						       void *netfs_data);
+
+extern void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
+					int retire);
+
+extern void __fscache_update_cookie(struct fscache_cookie *cookie);
+#endif
+
+static inline
+struct fscache_cookie *fscache_acquire_cookie(struct fscache_cookie *parent,
+					      struct fscache_cookie_def *def,
+					      void *netfs_data)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (parent)
+		return __fscache_acquire_cookie(parent, def, netfs_data);
+#endif
+	return NULL;
+}
+
+static inline
+void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+			       int retire)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		__fscache_relinquish_cookie(cookie, retire);
+#endif
+}
+
+static inline
+void fscache_update_cookie(struct fscache_cookie *cookie)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		__fscache_update_cookie(cookie);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * pin or unpin a cookie in a cache
+ * - only available for data cookies
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_pin_cookie(struct fscache_cookie *cookie);
+extern void __fscache_unpin_cookie(struct fscache_cookie *cookie);
+#endif
+
+static inline
+int fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_pin_cookie(cookie);
+#endif
+	return -ENOBUFS;
+}
+
+static inline
+void fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		__fscache_unpin_cookie(cookie);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * fscache cached network filesystem type
+ * - name, version and ops must be filled in before registration
+ * - all other fields will be set during registration
+ */
+struct fscache_netfs
+{
+	uint32_t			version;	/* indexing version */
+	const char			*name;		/* filesystem name */
+	struct fscache_cookie		*primary_index;
+	struct fscache_netfs_operations	*ops;
+	struct list_head		link;		/* internal link */
+};
+
+struct fscache_netfs_operations
+{
+};
+
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_register_netfs(struct fscache_netfs *netfs);
+extern void __fscache_unregister_netfs(struct fscache_netfs *netfs);
+#endif
+
+static inline
+int fscache_register_netfs(struct fscache_netfs *netfs)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	return __fscache_register_netfs(netfs);
+#else
+	return 0;
+#endif
+}
+
+static inline
+void fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	__fscache_unregister_netfs(netfs);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * look up a cache tag
+ * - cache tags are used to select specific caches in which to cache indexes
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name);
+extern void __fscache_release_cache_tag(struct fscache_cache_tag *tag);
+#endif
+
+static inline
+struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	return __fscache_lookup_cache_tag(name);
+#else
+	return NULL;
+#endif
+}
+
+static inline
+void fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	__fscache_release_cache_tag(tag);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * set the data size on a cached object
+ * - no pages beyond the end of the object will be accessible
+ * - returns -ENOBUFS if the file is not backed
+ * - returns -ENOSPC if a pinned file of that size can't be stored
+ * - returns 0 if okay
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size);
+#endif
+
+static inline
+int fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_set_i_size(cookie, i_size);
+#endif
+	return -ENOBUFS;
+}
+
+/*****************************************************************************/
+/*
+ * reserve data space for a cached object
+ * - returns -ENOBUFS if the file is not backed
+ * - returns -ENOSPC if there isn't enough space to honour the reservation
+ * - returns 0 if okay
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+#endif
+
+static inline
+int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_reserve_space(cookie, size);
+#endif
+	return -ENOBUFS;
+}
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - if the page is not backed by a file:
+ *   - -ENOBUFS will be returned and nothing more will be done
+ * - else if the page is backed by a block in the cache:
+ *   - a read will be started which will call end_io_func on completion
+ * - else if the page is unbacked:
+ *   - a block will be allocated
+ *   - -ENODATA will be returned
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+					struct page *page,
+					fscache_rw_complete_t end_io_func,
+					void *context,
+					gfp_t gfp);
+#endif
+
+static inline
+int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+			       struct page *page,
+			       fscache_rw_complete_t end_io_func,
+			       void *context,
+			       gfp_t gfp)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_read_or_alloc_page(cookie, page, end_io_func,
+						    context, gfp);
+#endif
+	return -ENOBUFS;
+}
+
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+					 struct address_space *mapping,
+					 struct list_head *pages,
+					 unsigned *nr_pages,
+					 fscache_rw_complete_t end_io_func,
+					 void *context,
+					 gfp_t gfp);
+#endif
+
+static inline
+int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+				struct address_space *mapping,
+				struct list_head *pages,
+				unsigned *nr_pages,
+				fscache_rw_complete_t end_io_func,
+				void *context,
+				gfp_t gfp)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_read_or_alloc_pages(cookie, mapping, pages,
+						     nr_pages, end_io_func,
+						     context, gfp);
+#endif
+	return -ENOBUFS;
+}
+
+/*
+ * allocate a block in which to store a page
+ * - if the page is not backed by a file:
+ *   - -ENOBUFS will be returned and nothing more will be done
+ * - else
+ *   - a block will be allocated if there isn't one
+ *   - 0 will be returned
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_alloc_page(struct fscache_cookie *cookie,
+				struct page *page,
+				gfp_t gfp);
+#endif
+
+static inline
+int fscache_alloc_page(struct fscache_cookie *cookie,
+		       struct page *page,
+		       gfp_t gfp)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_alloc_page(cookie, page, gfp);
+#endif
+	return -ENOBUFS;
+}
+
+/*
+ * request a page be stored in the cache
+ * - this request may be ignored if no cache block is currently allocated, in
+ *   which case it:
+ *   - returns -ENOBUFS
+ * - if a cache block was already allocated:
+ *   - a BIO will be dispatched to write the page (end_io_func will be called
+ *     from the completion function)
+ *   - returns 0
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern int __fscache_write_page(struct fscache_cookie *cookie,
+				struct page *page,
+				fscache_rw_complete_t end_io_func,
+				void *context,
+				gfp_t gfp);
+
+extern int __fscache_write_pages(struct fscache_cookie *cookie,
+				 struct pagevec *pagevec,
+				 fscache_rw_complete_t end_io_func,
+				 void *context,
+				 gfp_t gfp);
+#endif
+
+static inline
+int fscache_write_page(struct fscache_cookie *cookie,
+		       struct page *page,
+		       fscache_rw_complete_t end_io_func,
+		       void *context,
+		       gfp_t gfp)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_write_page(cookie, page, end_io_func,
+					    context, gfp);
+#endif
+	return -ENOBUFS;
+}
+
+static inline
+int fscache_write_pages(struct fscache_cookie *cookie,
+			struct pagevec *pagevec,
+			fscache_rw_complete_t end_io_func,
+			void *context,
+			gfp_t gfp)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		return __fscache_write_pages(cookie, pagevec, end_io_func,
+					     context, gfp);
+#endif
+	return -ENOBUFS;
+}
+
+/*
+ * indicate that caching is no longer required on a page
+ * - note: cannot cancel any outstanding BIOs between this page and the cache
+ */
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+extern void __fscache_uncache_page(struct fscache_cookie *cookie,
+				   struct page *page);
+extern void __fscache_uncache_pages(struct fscache_cookie *cookie,
+				    struct pagevec *pagevec);
+#endif
+
+static inline
+void fscache_uncache_page(struct fscache_cookie *cookie,
+			  struct page *page)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		__fscache_uncache_page(cookie, page);
+#endif
+}
+
+static inline
+void fscache_uncache_pagevec(struct fscache_cookie *cookie,
+			     struct pagevec *pagevec)
+{
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+	if (cookie)
+		__fscache_uncache_pages(cookie, pagevec);
+#endif
+}
+
+#endif /* _LINUX_FSCACHE_H */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] FS-Cache: Release page->private after failed readahead [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
  2006-08-18 15:35 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
  2006-08-18 15:35 ` [PATCH 2/7] FS-Cache: Generic filesystem caching facility " David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:35 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 mm/readahead.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index aa7ec42..b0788ae 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -14,6 +14,7 @@ #include <linux/module.h>
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/pagevec.h>
+#include <linux/buffer_head.h>
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -117,6 +118,41 @@ static inline unsigned long get_next_ra_
 
 #define list_to_page(head) (list_entry((head)->prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+					     struct page *page)
+{
+	if (PagePrivate(page)) {
+		if (TestSetPageLocked(page))
+			BUG();
+		page->mapping = mapping;
+		do_invalidatepage(page, 0);
+		page->mapping = NULL;
+		unlock_page(page);
+	}
+	page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+					      struct list_head *pages)
+{
+	struct page *victim;
+
+	while (!list_empty(pages)) {
+		victim = list_to_page(pages);
+		list_del(&victim->lru);
+		read_cache_pages_invalidate_page(mapping, victim);
+	}
+}
+
 /**
  * read_cache_pages - populate an address space with some pages & start reads against them
  * @mapping: the address_space
@@ -140,20 +176,14 @@ int read_cache_pages(struct address_spac
 		page = list_to_page(pages);
 		list_del(&page->lru);
 		if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
-			page_cache_release(page);
+			read_cache_pages_invalidate_page(mapping, page);
 			continue;
 		}
 		ret = filler(data, page);
 		if (!pagevec_add(&lru_pvec, page))
 			__pagevec_lru_add(&lru_pvec);
 		if (ret) {
-			while (!list_empty(pages)) {
-				struct page *victim;
-
-				victim = list_to_page(pages);
-				list_del(&victim->lru);
-				page_cache_release(victim);
-			}
+			read_cache_pages_invalidate_pages(mapping, pages);
 			break;
 		}
 	}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
                   ` (2 preceding siblings ...)
  2006-08-18 15:35 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:35 ` [PATCH 5/7] NFS: Use local caching " David Howells
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.  The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/Kconfig         |    7 +
 fs/afs/cache.h     |   27 -----
 fs/afs/cell.c      |  109 +++++++++++++--------
 fs/afs/cell.h      |   16 +--
 fs/afs/cmservice.c |    2 
 fs/afs/dir.c       |   10 +-
 fs/afs/file.c      |  265 ++++++++++++++++++++++++++++++++++------------------
 fs/afs/fsclient.c  |    4 +
 fs/afs/inode.c     |   45 ++++++---
 fs/afs/internal.h  |   25 ++---
 fs/afs/main.c      |   24 ++---
 fs/afs/mntpt.c     |   12 +-
 fs/afs/proc.c      |    1 
 fs/afs/server.c    |    3 -
 fs/afs/vlocation.c |  179 ++++++++++++++++++++++-------------
 fs/afs/vnode.c     |  250 ++++++++++++++++++++++++++++++++++++++++---------
 fs/afs/vnode.h     |   10 +-
 fs/afs/volume.c    |   78 ++++++---------
 fs/afs/volume.h    |   28 +----
 19 files changed, 671 insertions(+), 424 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 1a3d179..eecc0ed 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1922,6 +1922,13 @@ # for fs/nls/Config.in
 
 	  If unsure, say N.
 
+config AFS_FSCACHE
+	bool "Provide AFS client caching support"
+	depends on AFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want AFS data to be cached locally on through the
+	  generic filesystem cache manager
+
 config RXRPC
 	tristate
 
diff --git a/fs/afs/cache.h b/fs/afs/cache.h
deleted file mode 100644
index 9eb7722..0000000
--- a/fs/afs/cache.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* cache.h: AFS local cache management interface
- *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#ifndef _LINUX_AFS_CACHE_H
-#define _LINUX_AFS_CACHE_H
-
-#undef AFS_CACHING_SUPPORT
-
-#include <linux/mm.h>
-#ifdef AFS_CACHING_SUPPORT
-#include <linux/cachefs.h>
-#endif
-#include "types.h"
-
-#ifdef __KERNEL__
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_AFS_CACHE_H */
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index bfc1fd2..3aaeada 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -31,17 +31,21 @@ static DEFINE_RWLOCK(afs_cells_lock);
 static DECLARE_RWSEM(afs_cells_sem); /* add/remove serialisation */
 static struct afs_cell *afs_cell_root;
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-						const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
-	.name			= "cell_ix",
-	.data_size		= sizeof(struct afs_cache_cell),
-	.keys[0]		= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-	.match			= afs_cell_cache_match,
-	.update			= afs_cell_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+				       void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+				       void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+						   const void *buffer,
+						   uint16_t buflen);
+
+static struct fscache_cookie_def afs_cell_cache_index_def = {
+	.name		= "AFS cell",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_cell_cache_get_key,
+	.get_aux	= afs_cell_cache_get_aux,
+	.check_aux	= afs_cell_cache_check_aux,
 };
 #endif
 
@@ -115,12 +119,11 @@ int afs_cell_create(const char *name, ch
 	if (ret < 0)
 		goto error;
 
-#ifdef AFS_CACHING_SUPPORT
-	/* put it up for caching */
-	cachefs_acquire_cookie(afs_cache_netfs.primary_index,
-			       &afs_vlocation_cache_index_def,
-			       cell,
-			       &cell->cache);
+#ifdef CONFIG_AFS_FSCACHE
+	/* put it up for caching (this never returns an error) */
+	cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
+					     &afs_cell_cache_index_def,
+					     cell);
 #endif
 
 	/* add to the cell lists */
@@ -345,8 +348,8 @@ static void afs_cell_destroy(struct afs_
 	list_del_init(&cell->proc_link);
 	up_write(&afs_proc_cells_sem);
 
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(cell->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(cell->cache, 0);
 #endif
 
 	up_write(&afs_cells_sem);
@@ -525,44 +528,62 @@ void afs_cell_purge(void)
 
 /*****************************************************************************/
 /*
- * match a cell record obtained from the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-						const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+				       void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_cell *ccell = entry;
-	struct afs_cell *cell = target;
+	const struct afs_cell *cell = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%s},{%s}", ccell->name, cell->name);
+	_enter("%p,%p,%u", cell, buffer, bufmax);
 
-	if (strncmp(ccell->name, cell->name, sizeof(ccell->name)) == 0) {
-		_leave(" = SUCCESS");
-		return CACHEFS_MATCH_SUCCESS;
-	}
+	klen = strlen(cell->name);
+	if (klen > bufmax)
+		return 0;
+
+	memcpy(buffer, cell->name, klen);
+	return klen;
 
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_cell_cache_match() */
+} /* end afs_cell_cache_get_key() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a cell record in the cache
+ * provide new auxilliary cache data
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_cell_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+				       void *buffer, uint16_t bufmax)
 {
-	struct afs_cache_cell *ccell = entry;
-	struct afs_cell *cell = source;
+	const struct afs_cell *cell = cookie_netfs_data;
+	uint16_t dlen;
 
-	_enter("%p,%p", source, entry);
+	_enter("%p,%p,%u", cell, buffer, bufmax);
 
-	strncpy(ccell->name, cell->name, sizeof(ccell->name));
+	dlen = cell->vl_naddrs * sizeof(cell->vl_addrs[0]);
+	dlen = min(dlen, bufmax);
+	dlen &= ~(sizeof(cell->vl_addrs[0]) - 1);
 
-	memcpy(ccell->vl_servers,
-	       cell->vl_addrs,
-	       min(sizeof(ccell->vl_servers), sizeof(cell->vl_addrs)));
+	memcpy(buffer, cell->vl_addrs, dlen);
+
+	return dlen;
+
+} /* end afs_cell_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+						   const void *buffer,
+						   uint16_t buflen)
+{
+	_leave(" = OKAY");
+	return FSCACHE_CHECKAUX_OKAY;
 
-} /* end afs_cell_cache_update() */
+} /* end afs_cell_cache_check_aux() */
 #endif
diff --git a/fs/afs/cell.h b/fs/afs/cell.h
index 4834910..d670502 100644
--- a/fs/afs/cell.h
+++ b/fs/afs/cell.h
@@ -13,7 +13,7 @@ #ifndef _LINUX_AFS_CELL_H
 #define _LINUX_AFS_CELL_H
 
 #include "types.h"
-#include "cache.h"
+#include <linux/fscache.h>
 
 #define AFS_CELL_MAX_ADDRS 15
 
@@ -21,16 +21,6 @@ extern volatile int afs_cells_being_purg
 
 /*****************************************************************************/
 /*
- * entry in the cached cell catalogue
- */
-struct afs_cache_cell
-{
-	char			name[64];	/* cell name (padded with NULs) */
-	struct in_addr		vl_servers[15];	/* cached cell VL servers */
-};
-
-/*****************************************************************************/
-/*
  * AFS cell record
  */
 struct afs_cell
@@ -39,8 +29,8 @@ struct afs_cell
 	struct list_head	link;		/* main cell list link */
 	struct list_head	proc_link;	/* /proc cell list link */
 	struct proc_dir_entry	*proc_dir;	/* /proc dir for this cell */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 
 	/* server record management */
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 3d097fd..f87d5a7 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -24,7 +24,7 @@ #include "cmservice.h"
 #include "internal.h"
 
 static unsigned afscm_usage;		/* AFS cache manager usage count */
-static struct rw_semaphore afscm_sem;	/* AFS cache manager start/stop semaphore */
+static DECLARE_RWSEM(afscm_sem);	/* AFS cache manager start/stop semaphore */
 
 static int afscm_new_call(struct rxrpc_call *call);
 static void afscm_attention(struct rxrpc_call *call);
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index f1c965f..94afb75 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -145,7 +145,7 @@ #endif
 	qty /= sizeof(union afs_dir_block);
 
 	/* check them */
-	dbuf = page_address(page);
+	dbuf = kmap_atomic(page, KM_USER0);
 	for (tmp = 0; tmp < qty; tmp++) {
 		if (dbuf->blocks[tmp].pagehdr.magic != AFS_DIR_MAGIC) {
 			printk("kAFS: %s(%lu): bad magic %d/%d is %04hx\n",
@@ -154,10 +154,12 @@ #endif
 			goto error;
 		}
 	}
+	kunmap_atomic(dbuf, KM_USER0);
 
 	return;
 
  error:
+	kunmap_atomic(dbuf, KM_USER0);
 	SetPageError(page);
 
 } /* end afs_dir_check_page() */
@@ -168,7 +170,6 @@ #endif
  */
 static inline void afs_dir_put_page(struct page *page)
 {
-	kunmap(page);
 	page_cache_release(page);
 
 } /* end afs_dir_put_page() */
@@ -186,7 +187,6 @@ static struct page *afs_dir_get_page(str
 	page = read_mapping_page(dir->i_mapping, index, NULL);
 	if (!IS_ERR(page)) {
 		wait_on_page_locked(page);
-		kmap(page);
 		if (!PageUptodate(page))
 			goto fail;
 		afs_dir_check_page(dir, page);
@@ -354,7 +354,7 @@ static int afs_dir_iterate(struct inode 
 
 		limit = blkoff & ~(PAGE_SIZE - 1);
 
-		dbuf = page_address(page);
+		dbuf = kmap_atomic(page, KM_USER0);
 
 		/* deal with the individual blocks stashed on this page */
 		do {
@@ -363,6 +363,7 @@ static int afs_dir_iterate(struct inode 
 			ret = afs_dir_iterate_block(fpos, dblock, blkoff,
 						    cookie, filldir);
 			if (ret != 1) {
+				kunmap_atomic(dbuf, KM_USER0);
 				afs_dir_put_page(page);
 				goto out;
 			}
@@ -371,6 +372,7 @@ static int afs_dir_iterate(struct inode 
 
 		} while (*fpos < dir->i_size && blkoff < limit);
 
+		kunmap_atomic(dbuf, KM_USER0);
 		afs_dir_put_page(page);
 		ret = 0;
 	}
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 67d6634..93f2cc0 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -16,12 +16,15 @@ #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/buffer_head.h>
 #include "volume.h"
 #include "vnode.h"
 #include <rxrpc/call.h>
 #include "internal.h"
 
+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
 #if 0
 static int afs_file_open(struct inode *inode, struct file *file);
 static int afs_file_release(struct inode *inode, struct file *file);
@@ -30,55 +33,93 @@ #endif
 static int afs_file_readpage(struct file *file, struct page *page);
 static void afs_file_invalidatepage(struct page *page, unsigned long offset);
 static int afs_file_releasepage(struct page *page, gfp_t gfp_flags);
+static int afs_file_mmap(struct file * file, struct vm_area_struct * vma);
+
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+			      struct list_head *pages, unsigned nr_pages);
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+#endif
 
 struct inode_operations afs_file_inode_operations = {
 	.getattr	= afs_inode_getattr,
 };
 
+const struct file_operations afs_file_file_operations = {
+	.llseek		= generic_file_llseek,
+	.read		= generic_file_read,
+	.mmap		= afs_file_mmap,
+	.sendfile	= generic_file_sendfile,
+};
+
 const struct address_space_operations afs_fs_aops = {
 	.readpage	= afs_file_readpage,
+#ifdef CONFIG_AFS_FSCACHE
+	.readpages	= afs_file_readpages,
+#endif
 	.sync_page	= block_sync_page,
 	.set_page_dirty	= __set_page_dirty_nobuffers,
 	.releasepage	= afs_file_releasepage,
 	.invalidatepage	= afs_file_invalidatepage,
 };
 
+static struct vm_operations_struct afs_fs_vm_operations = {
+	.nopage		= filemap_nopage,
+	.populate	= filemap_populate,
+#ifdef CONFIG_AFS_FSCACHE
+	.page_mkwrite	= afs_file_page_mkwrite,
+#endif
+};
+
+/*****************************************************************************/
+/*
+ * set up a memory mapping on an AFS file
+ * - we set our own VMA ops so that we can catch the page becoming writable for
+ *   userspace for shared-writable mmap
+ */
+static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	_enter("");
+
+	file_accessed(file);
+	vma->vm_ops = &afs_fs_vm_operations;
+	return 0;
+}
+
 /*****************************************************************************/
 /*
  * deal with notification that a page was read from the cache
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_read_complete(void *cookie_data,
-					    struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_read_complete(struct page *page,
 					    void *data,
 					    int error)
 {
-	_enter("%p,%p,%p,%d", cookie_data, page, data, error);
+	_enter("%p,%p,%d", page, data, error);
 
-	if (error)
-		SetPageError(page);
-	else
+	/* if the read completes with an error, we just unlock the page and let
+	 * the VM reissue the readpage */
+	if (!error)
 		SetPageUptodate(page);
 	unlock_page(page);
-
-} /* end afs_file_readpage_read_complete() */
+}
 #endif
 
 /*****************************************************************************/
 /*
  * deal with notification that a page was written to the cache
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_write_complete(void *cookie_data,
-					     struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_write_complete(struct page *page,
 					     void *data,
 					     int error)
 {
-	_enter("%p,%p,%p,%d", cookie_data, page, data, error);
-
-	unlock_page(page);
+	_enter("%p,%p,%d", page, data, error);
 
-} /* end afs_file_readpage_write_complete() */
+	/* note that the page has been written to the cache and can now be
+	 * modified */
+	end_page_fs_misc(page);
+}
 #endif
 
 /*****************************************************************************/
@@ -88,16 +129,13 @@ #endif
 static int afs_file_readpage(struct file *file, struct page *page)
 {
 	struct afs_rxfs_fetch_descriptor desc;
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_page *pageio;
-#endif
 	struct afs_vnode *vnode;
 	struct inode *inode;
 	int ret;
 
 	inode = page->mapping->host;
 
-	_enter("{%lu},{%lu}", inode->i_ino, page->index);
+	_enter("{%lu},%p{%lu}", inode->i_ino, page, page->index);
 
 	vnode = AFS_FS_I(inode);
 
@@ -107,13 +145,9 @@ #endif
 	if (vnode->flags & AFS_VNODE_DELETED)
 		goto error;
 
-#ifdef AFS_CACHING_SUPPORT
-	ret = cachefs_page_get_private(page, &pageio, GFP_NOIO);
-	if (ret < 0)
-		goto error;
-
+#ifdef CONFIG_AFS_FSCACHE
 	/* is it cached? */
-	ret = cachefs_read_or_alloc_page(vnode->cache,
+	ret = fscache_read_or_alloc_page(vnode->cache,
 					 page,
 					 afs_file_readpage_read_complete,
 					 NULL,
@@ -123,18 +157,20 @@ #else
 #endif
 
 	switch (ret) {
-		/* read BIO submitted and wb-journal entry found */
-	case 1:
-		BUG(); // TODO - handle wb-journal match
-
 		/* read BIO submitted (page in cache) */
 	case 0:
 		break;
 
-		/* no page available in cache */
-	case -ENOBUFS:
+		/* page not yet cached */
 	case -ENODATA:
+		_debug("cache said ENODATA");
+		goto go_on;
+
+		/* page will not be cached */
+	case -ENOBUFS:
+		_debug("cache said ENOBUFS");
 	default:
+	go_on:
 		desc.fid	= vnode->fid;
 		desc.offset	= page->index << PAGE_CACHE_SHIFT;
 		desc.size	= min((size_t) (inode->i_size - desc.offset),
@@ -148,34 +184,40 @@ #endif
 		ret = afs_vnode_fetch_data(vnode, &desc);
 		kunmap(page);
 		if (ret < 0) {
-			if (ret==-ENOENT) {
-				_debug("got NOENT from server"
+			if (ret == -ENOENT) {
+				kdebug("got NOENT from server"
 				       " - marking file deleted and stale");
 				vnode->flags |= AFS_VNODE_DELETED;
 				ret = -ESTALE;
 			}
 
-#ifdef AFS_CACHING_SUPPORT
-			cachefs_uncache_page(vnode->cache, page);
+#ifdef CONFIG_AFS_FSCACHE
+			fscache_uncache_page(vnode->cache, page);
+			ClearPagePrivate(page);
 #endif
 			goto error;
 		}
 
 		SetPageUptodate(page);
 
-#ifdef AFS_CACHING_SUPPORT
-		if (cachefs_write_page(vnode->cache,
-				       page,
-				       afs_file_readpage_write_complete,
-				       NULL,
-				       GFP_KERNEL) != 0
-		    ) {
-			cachefs_uncache_page(vnode->cache, page);
-			unlock_page(page);
+		/* send the page to the cache */
+#ifdef CONFIG_AFS_FSCACHE
+		if (PagePrivate(page)) {
+			if (TestSetPageFsMisc(page))
+				BUG();
+			if (fscache_write_page(vnode->cache,
+					       page,
+					       afs_file_readpage_write_complete,
+					       NULL,
+					       GFP_KERNEL) != 0
+			    ) {
+				fscache_uncache_page(vnode->cache, page);
+				ClearPagePrivate(page);
+				end_page_fs_misc(page);
+			}
 		}
-#else
-		unlock_page(page);
 #endif
+		unlock_page(page);
 	}
 
 	_leave(" = 0");
@@ -187,87 +229,122 @@ #endif
 
 	_leave(" = %d", ret);
 	return ret;
-
-} /* end afs_file_readpage() */
+}
 
 /*****************************************************************************/
 /*
- * get a page cookie for the specified page
+ * read a set of pages
  */
-#ifdef AFS_CACHING_SUPPORT
-int afs_cache_get_page_cookie(struct page *page,
-			      struct cachefs_page **_page_cookie)
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+			      struct list_head *pages, unsigned nr_pages)
 {
-	int ret;
+	struct afs_vnode *vnode;
+	int ret = 0;
 
-	_enter("");
-	ret = cachefs_page_get_private(page,_page_cookie, GFP_NOIO);
+	_enter(",{%lu},,%d", mapping->host->i_ino, nr_pages);
 
-	_leave(" = %d", ret);
+	vnode = AFS_FS_I(mapping->host);
+	if (vnode->flags & AFS_VNODE_DELETED) {
+		_leave(" = -ESTALE");
+		return -ESTALE;
+	}
+
+	/* attempt to read as many of the pages as possible */
+	ret = fscache_read_or_alloc_pages(vnode->cache,
+					  mapping,
+					  pages,
+					  &nr_pages,
+					  afs_file_readpage_read_complete,
+					  NULL,
+					  mapping_gfp_mask(mapping));
+
+	switch (ret) {
+		/* all pages are being read from the cache */
+	case 0:
+		BUG_ON(!list_empty(pages));
+		BUG_ON(nr_pages != 0);
+		_leave(" = 0 [reading all]");
+		return 0;
+
+		/* there were pages that couldn't be read from the cache */
+	case -ENODATA:
+	case -ENOBUFS:
+		break;
+
+		/* other error */
+	default:
+		_leave(" = %d", ret);
+		return ret;
+	}
+
+	/* load the missing pages from the network */
+	ret = read_cache_pages(mapping, pages,
+			       (void *) afs_file_readpage, NULL);
+
+	_leave(" = %d [netting]", ret);
 	return ret;
-} /* end afs_cache_get_page_cookie() */
+}
 #endif
 
 /*****************************************************************************/
 /*
  * invalidate part or all of a page
+ * - release a page and clean up its private data if offset is 0 (indicating
+ *   the entire page)
  */
 static void afs_file_invalidatepage(struct page *page, unsigned long offset)
 {
-	int ret = 1;
-
 	_enter("{%lu},%lu", page->index, offset);
 
 	BUG_ON(!PageLocked(page));
 
 	if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
-		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-		cachefs_uncache_page(vnode->cache,page);
+		/* we clean up only if the entire page is being invalidated */
+		if (offset == 0 && !PageWriteback(page)) {
+#ifdef CONFIG_AFS_FSCACHE
+			wait_on_page_fs_misc(page);
+			fscache_uncache_page(
+				AFS_FS_I(page->mapping->host)->cache, page);
+			ClearPagePrivate(page);
 #endif
-
-		/* We release buffers only if the entire page is being
-		 * invalidated.
-		 * The get_block cached value has been unconditionally
-		 * invalidated, so real IO is not possible anymore.
-		 */
-		if (offset == 0) {
-			BUG_ON(!PageLocked(page));
-
-			ret = 0;
-			if (!PageWriteback(page))
-				ret = page->mapping->a_ops->releasepage(page,
-									0);
-			/* possibly should BUG_ON(!ret); - neilb */
 		}
 	}
 
-	_leave(" = %d", ret);
-} /* end afs_file_invalidatepage() */
+	_leave("");
+}
 
 /*****************************************************************************/
 /*
- * release a page and cleanup its private data
+ * release a page and clean up its private state if it's not busy
+ * - return true if the page can now be released, false if not
  */
 static int afs_file_releasepage(struct page *page, gfp_t gfp_flags)
 {
-	struct cachefs_page *pageio;
-
 	_enter("{%lu},%x", page->index, gfp_flags);
 
-	if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
-		struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-		cachefs_uncache_page(vnode->cache, page);
-#endif
+	/* deny */
+	if (PageFsMisc(page)) {
+		_leave(" = F");
+		return 0;
+	}
 
-		pageio = (struct cachefs_page *) page_private(page);
-		set_page_private(page, 0);
-		ClearPagePrivate(page);
+	fscache_uncache_page(AFS_FS_I(page->mapping->host)->cache, page);
 
-		kfree(pageio);
-	}
+	/* indicate that the page can be released */
+	_leave(" = T");
+	return 1;
+}
 
-	_leave(" = 0");
+/*****************************************************************************/
+/*
+ * wait for the disc cache to finish writing before permitting modification of
+ * our page in the page cache
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
 	return 0;
-} /* end afs_file_releasepage() */
+}
+#endif
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 61bc371..c88c41a 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -398,6 +398,8 @@ int afs_rxfs_fetch_file_status(struct af
 		bp++; /* spare6 */
 	}
 
+	_debug("Data Version %llx\n", vnode->status.version);
+
 	/* success */
 	ret = 0;
 
@@ -408,7 +410,7 @@ int afs_rxfs_fetch_file_status(struct af
  out_put_conn:
 	afs_server_release_callslot(server, &callslot);
  out:
-	_leave("");
+	_leave(" = %d", ret);
 	return ret;
 
  abort:
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 4ebb30a..0a59eda 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -49,7 +49,7 @@ static int afs_inode_map_status(struct a
 	case AFS_FTYPE_FILE:
 		inode->i_mode	= S_IFREG | vnode->status.mode;
 		inode->i_op	= &afs_file_inode_operations;
-		inode->i_fop	= &generic_ro_fops;
+		inode->i_fop	= &afs_file_file_operations;
 		break;
 	case AFS_FTYPE_DIR:
 		inode->i_mode	= S_IFDIR | vnode->status.mode;
@@ -65,6 +65,11 @@ static int afs_inode_map_status(struct a
 		return -EBADMSG;
 	}
 
+#ifdef CONFIG_AFS_FSCACHE
+	if (vnode->status.size != inode->i_size)
+		fscache_set_i_size(vnode->cache, vnode->status.size);
+#endif
+
 	inode->i_nlink		= vnode->status.nlink;
 	inode->i_uid		= vnode->status.owner;
 	inode->i_gid		= 0;
@@ -101,13 +106,33 @@ static int afs_inode_fetch_status(struct
 	struct afs_vnode *vnode;
 	int ret;
 
+	_enter("");
+
 	vnode = AFS_FS_I(inode);
 
 	ret = afs_vnode_fetch_status(vnode);
 
-	if (ret == 0)
+	if (ret == 0) {
+#ifdef CONFIG_AFS_FSCACHE
+		if (!vnode->cache) {
+			vnode->cache =
+				fscache_acquire_cookie(vnode->volume->cache,
+						       &afs_vnode_cache_index_def,
+						       vnode);
+			if (!vnode->cache)
+				printk("Negative\n");
+		}
+#endif
 		ret = afs_inode_map_status(vnode);
+#ifdef CONFIG_AFS_FSCACHE
+		if (ret < 0) {
+			fscache_relinquish_cookie(vnode->cache, 0);
+			vnode->cache = NULL;
+		}
+#endif
+	}
 
+	_leave(" = %d", ret);
 	return ret;
 
 } /* end afs_inode_fetch_status() */
@@ -122,6 +147,7 @@ static int afs_iget5_test(struct inode *
 
 	return inode->i_ino == data->fid.vnode &&
 		inode->i_version == data->fid.unique;
+
 } /* end afs_iget5_test() */
 
 /*****************************************************************************/
@@ -179,20 +205,11 @@ inline int afs_iget(struct super_block *
 		return ret;
 	}
 
-#ifdef AFS_CACHING_SUPPORT
-	/* set up caching before reading the status, as fetch-status reads the
-	 * first page of symlinks to see if they're really mntpts */
-	cachefs_acquire_cookie(vnode->volume->cache,
-			       NULL,
-			       vnode,
-			       &vnode->cache);
-#endif
-
 	/* okay... it's a new inode */
 	inode->i_flags |= S_NOATIME;
 	vnode->flags |= AFS_VNODE_CHANGED;
 	ret = afs_inode_fetch_status(inode);
-	if (ret<0)
+	if (ret < 0)
 		goto bad_inode;
 
 	/* success */
@@ -278,8 +295,8 @@ void afs_clear_inode(struct inode *inode
 
 	afs_vnode_give_up_callback(vnode);
 
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(vnode->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(vnode->cache, 0);
 	vnode->cache = NULL;
 #endif
 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e88b3b6..482dbd1 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -16,15 +16,17 @@ #include <linux/compiler.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/fscache.h>
 
 /*
  * debug tracing
  */
-#define kenter(FMT, a...)	printk("==> %s("FMT")\n",__FUNCTION__ , ## a)
-#define kleave(FMT, a...)	printk("<== %s()"FMT"\n",__FUNCTION__ , ## a)
-#define kdebug(FMT, a...)	printk(FMT"\n" , ## a)
-#define kproto(FMT, a...)	printk("### "FMT"\n" , ## a)
-#define knet(FMT, a...)		printk(FMT"\n" , ## a)
+#define __kdbg(FMT, a...)	printk("[%05d] "FMT"\n", current->pid , ## a)
+#define kenter(FMT, a...)	__kdbg("==> %s("FMT")", __FUNCTION__ , ## a)
+#define kleave(FMT, a...)	__kdbg("<== %s()"FMT, __FUNCTION__ , ## a)
+#define kdebug(FMT, a...)	__kdbg(FMT , ## a)
+#define kproto(FMT, a...)	__kdbg("### "FMT , ## a)
+#define knet(FMT, a...)		__kdbg(FMT , ## a)
 
 #ifdef __KDEBUG
 #define _enter(FMT, a...)	kenter(FMT , ## a)
@@ -56,9 +58,6 @@ static inline void afs_discard_my_signal
  */
 extern struct rw_semaphore afs_proc_cells_sem;
 extern struct list_head afs_proc_cells;
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_cache_cell_index_def;
-#endif
 
 /*
  * dir.c
@@ -71,11 +70,7 @@ extern const struct file_operations afs_
  */
 extern const struct address_space_operations afs_fs_aops;
 extern struct inode_operations afs_file_inode_operations;
-
-#ifdef AFS_CACHING_SUPPORT
-extern int afs_cache_get_page_cookie(struct page *page,
-				     struct cachefs_page **_page_cookie);
-#endif
+extern const struct file_operations afs_file_file_operations;
 
 /*
  * inode.c
@@ -97,8 +92,8 @@ #endif
 /*
  * main.c
  */
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_netfs afs_cache_netfs;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_netfs afs_cache_netfs;
 #endif
 
 /*
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 913c689..5840bb2 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -1,6 +1,6 @@
 /* main.c: AFS client file system
  *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2002,5 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
  * This program is free software; you can redistribute it and/or
@@ -14,11 +14,11 @@ #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
+#include <linux/fscache.h>
 #include <rxrpc/rxrpc.h>
 #include <rxrpc/transport.h>
 #include <rxrpc/call.h>
 #include <rxrpc/peer.h>
-#include "cache.h"
 #include "cell.h"
 #include "server.h"
 #include "fsclient.h"
@@ -51,12 +51,11 @@ static struct rxrpc_peer_ops afs_peer_op
 struct list_head afs_cb_hash_tbl[AFS_CB_HASH_COUNT];
 DEFINE_SPINLOCK(afs_cb_hash_lock);
 
-#ifdef AFS_CACHING_SUPPORT
-static struct cachefs_netfs_operations afs_cache_ops = {
-	.get_page_cookie	= afs_cache_get_page_cookie,
+#ifdef CONFIG_AFS_FSCACHE
+static struct fscache_netfs_operations afs_cache_ops = {
 };
 
-struct cachefs_netfs afs_cache_netfs = {
+struct fscache_netfs afs_cache_netfs = {
 	.name			= "afs",
 	.version		= 0,
 	.ops			= &afs_cache_ops,
@@ -83,10 +82,9 @@ static int __init afs_init(void)
 	if (ret < 0)
 		return ret;
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* we want to be able to cache */
-	ret = cachefs_register_netfs(&afs_cache_netfs,
-				     &afs_cache_cell_index_def);
+	ret = fscache_register_netfs(&afs_cache_netfs);
 	if (ret < 0)
 		goto error;
 #endif
@@ -137,8 +135,8 @@ #ifdef CONFIG_KEYS_TURNED_OFF
 	afs_key_unregister();
  error_cache:
 #endif
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_unregister_netfs(&afs_cache_netfs);
  error:
 #endif
 	afs_cell_purge();
@@ -167,8 +165,8 @@ static void __exit afs_exit(void)
 #ifdef CONFIG_KEYS_TURNED_OFF
 	afs_key_unregister();
 #endif
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_unregister_netfs(&afs_cache_netfs);
 #endif
 	afs_proc_cleanup();
 
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 99785a7..2a53d51 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -78,7 +78,7 @@ int afs_mntpt_check_symlink(struct afs_v
 
 	ret = -EIO;
 	wait_on_page_locked(page);
-	buf = kmap(page);
+	buf = kmap_atomic(page, KM_USER0);
 	if (!PageUptodate(page))
 		goto out_free;
 	if (PageError(page))
@@ -101,7 +101,7 @@ int afs_mntpt_check_symlink(struct afs_v
 	ret = 0;
 
  out_free:
-	kunmap(page);
+	kunmap_atomic(buf, KM_USER0);
 	page_cache_release(page);
  out:
 	_leave(" = %d", ret);
@@ -188,9 +188,9 @@ static struct vfsmount *afs_mntpt_do_aut
 	if (!PageUptodate(page) || PageError(page))
 		goto error;
 
-	buf = kmap(page);
+	buf = kmap_atomic(page, KM_USER0);
 	memcpy(devname, buf, size);
-	kunmap(page);
+	kunmap_atomic(buf, KM_USER0);
 	page_cache_release(page);
 	page = NULL;
 
@@ -269,12 +269,12 @@ static void *afs_mntpt_follow_link(struc
  */
 static void afs_mntpt_expiry_timed_out(struct afs_timer *timer)
 {
-	kenter("");
+//	kenter("");
 
 	mark_mounts_for_expiry(&afs_vfsmounts);
 
 	afs_kafstimod_add_timer(&afs_mntpt_expiry_timer,
 				afs_mntpt_expiry_timeout * HZ);
 
-	kleave("");
+//	kleave("");
 } /* end afs_mntpt_expiry_timed_out() */
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 101d21b..db58488 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -177,6 +177,7 @@ int afs_proc_init(void)
  */
 void afs_proc_cleanup(void)
 {
+	remove_proc_entry("rootcell", proc_afs);
 	remove_proc_entry("cells", proc_afs);
 
 	remove_proc_entry("fs/afs", NULL);
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 22afaae..e94628c 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -375,7 +375,6 @@ int afs_server_request_callslot(struct a
 	else if (list_empty(&server->fs_callq)) {
 		/* no one waiting */
 		server->fs_conn_cnt[nconn]++;
-		spin_unlock(&server->fs_lock);
 	}
 	else {
 		/* someone's waiting - dequeue them and wake them up */
@@ -393,9 +392,9 @@ int afs_server_request_callslot(struct a
 		}
 		pcallslot->ready = 1;
 		wake_up_process(pcallslot->task);
-		spin_unlock(&server->fs_lock);
 	}
 
+	spin_unlock(&server->fs_lock);
 	rxrpc_put_connection(callslot->conn);
 	callslot->conn = NULL;
 
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 331f730..20148bc 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -59,17 +59,21 @@ static LIST_HEAD(afs_vlocation_update_pe
 static struct afs_vlocation *afs_vlocation_update;	/* VL currently being updated */
 static DEFINE_SPINLOCK(afs_vlocation_update_lock); /* lock guarding update queue */
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
-						     const void *entry);
-static void afs_vlocation_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vlocation_cache_index_def = {
-	.name		= "vldb",
-	.data_size	= sizeof(struct afs_cache_vlocation),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-	.match		= afs_vlocation_cache_match,
-	.update		= afs_vlocation_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+							const void *buffer,
+							uint16_t buflen);
+
+static struct fscache_cookie_def afs_vlocation_cache_index_def = {
+	.name		= "AFS.vldb",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_vlocation_cache_get_key,
+	.get_aux	= afs_vlocation_cache_get_aux,
+	.check_aux	= afs_vlocation_cache_check_aux,
 };
 #endif
 
@@ -300,13 +304,12 @@ int afs_vlocation_lookup(struct afs_cell
 
 	list_add_tail(&vlocation->link, &cell->vl_list);
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* we want to store it in the cache, plus it might already be
 	 * encached */
-	cachefs_acquire_cookie(cell->cache,
-			       &afs_volume_cache_index_def,
-			       vlocation,
-			       &vlocation->cache);
+	vlocation->cache = fscache_acquire_cookie(cell->cache,
+						  &afs_vlocation_cache_index_def,
+						  vlocation);
 
 	if (vlocation->valid)
 		goto found_in_cache;
@@ -340,7 +343,7 @@ #endif
  active:
 	active = 1;
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
  found_in_cache:
 #endif
 	/* try to look up a cached volume in the cell VL databases by ID */
@@ -422,9 +425,9 @@ #endif
 
 	afs_kafstimod_add_timer(&vlocation->upd_timer, 10 * HZ);
 
-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
 	/* update volume entry in local cache */
-	cachefs_update_cookie(vlocation->cache);
+	fscache_update_cookie(vlocation->cache);
 #endif
 
 	*_vlocation = vlocation;
@@ -438,8 +441,8 @@ #endif
 		}
 		else {
 			list_del(&vlocation->link);
-#ifdef AFS_CACHING_SUPPORT
-			cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+			fscache_relinquish_cookie(vlocation->cache, 0);
 #endif
 			afs_put_cell(vlocation->cell);
 			kfree(vlocation);
@@ -536,8 +539,8 @@ void afs_vlocation_do_timeout(struct afs
 	}
 
 	/* we can now destroy it properly */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(vlocation->cache, 0);
 #endif
 	afs_put_cell(cell);
 
@@ -888,65 +891,103 @@ static void afs_vlocation_update_discard
 
 /*****************************************************************************/
 /*
- * match a VLDB record stored in the cache
- * - may also load target from entry
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
-						     const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vlocation *vldb = entry;
-	struct afs_vlocation *vlocation = target;
+	const struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%s},{%s}", vlocation->vldb.name, vldb->name);
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
 
-	if (strncmp(vlocation->vldb.name, vldb->name, sizeof(vldb->name)) == 0
-	    ) {
-		if (!vlocation->valid ||
-		    vlocation->vldb.rtime == vldb->rtime
-		    ) {
-			vlocation->vldb = *vldb;
-			vlocation->valid = 1;
-			_leave(" = SUCCESS [c->m]");
-			return CACHEFS_MATCH_SUCCESS;
-		}
-		/* need to update cache if cached info differs */
-		else if (memcmp(&vlocation->vldb, vldb, sizeof(*vldb)) != 0) {
-			/* delete if VIDs for this name differ */
-			if (memcmp(&vlocation->vldb.vid,
-				   &vldb->vid,
-				   sizeof(vldb->vid)) != 0) {
-				_leave(" = DELETE");
-				return CACHEFS_MATCH_SUCCESS_DELETE;
-			}
+	klen = strnlen(vlocation->vldb.name, sizeof(vlocation->vldb.name));
+	if (klen > bufmax)
+		return 0;
 
-			_leave(" = UPDATE");
-			return CACHEFS_MATCH_SUCCESS_UPDATE;
-		}
-		else {
-			_leave(" = SUCCESS");
-			return CACHEFS_MATCH_SUCCESS;
-		}
-	}
+	memcpy(buffer, vlocation->vldb.name, klen);
+
+	_leave(" = %u", klen);
+	return klen;
 
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_vlocation_cache_match() */
+} /* end afs_vlocation_cache_get_key() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a VLDB record stored in the cache
+ * provide new auxilliary cache data
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vlocation_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+					    void *buffer, uint16_t bufmax)
 {
-	struct afs_cache_vlocation *vldb = entry;
-	struct afs_vlocation *vlocation = source;
+	const struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t dlen;
 
-	_enter("");
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
+
+	dlen = sizeof(struct afs_cache_vlocation);
+	dlen -= offsetof(struct afs_cache_vlocation, nservers);
+	if (dlen > bufmax)
+		return 0;
+
+	memcpy(buffer, (uint8_t *)&vlocation->vldb.nservers, dlen);
+
+	_leave(" = %u", dlen);
+	return dlen;
+
+} /* end afs_vlocation_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+							const void *buffer,
+							uint16_t buflen)
+{
+	const struct afs_cache_vlocation *cvldb;
+	struct afs_vlocation *vlocation = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%s},%p,%u", vlocation->vldb.name, buffer, buflen);
+
+	/* check the size of the data is what we're expecting */
+	dlen = sizeof(struct afs_cache_vlocation);
+	dlen -= offsetof(struct afs_cache_vlocation, nservers);
+	if (dlen != buflen)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	cvldb = container_of(buffer, struct afs_cache_vlocation, nservers);
+
+	/* if what's on disk is more valid than what's in memory, then use the
+	 * VL record from the cache */
+	if (!vlocation->valid || vlocation->vldb.rtime == cvldb->rtime) {
+		memcpy((uint8_t *)&vlocation->vldb.nservers, buffer, dlen);
+		vlocation->valid = 1;
+		_leave(" = SUCCESS [c->m]");
+		return FSCACHE_CHECKAUX_OKAY;
+	}
+
+	/* need to update the cache if the cached info differs */
+	if (memcmp(&vlocation->vldb, buffer, dlen) != 0) {
+		/* delete if the volume IDs for this name differ */
+		if (memcmp(&vlocation->vldb.vid, &cvldb->vid,
+			   sizeof(cvldb->vid)) != 0
+		    ) {
+			_leave(" = OBSOLETE");
+			return FSCACHE_CHECKAUX_OBSOLETE;
+		}
+
+		_leave(" = UPDATE");
+		return FSCACHE_CHECKAUX_NEEDS_UPDATE;
+	}
 
-	*vldb = vlocation->vldb;
+	_leave(" = OKAY");
+	return FSCACHE_CHECKAUX_OKAY;
 
-} /* end afs_vlocation_cache_update() */
+} /* end afs_vlocation_cache_check_aux() */
 #endif
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index cf62da5..cd72674 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -29,17 +29,30 @@ struct afs_timer_ops afs_vnode_cb_timed_
 	.timed_out	= afs_vnode_cb_timed_out,
 };
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
-						 const void *entry);
-static void afs_vnode_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vnode_cache_index_def = {
-	.name		= "vnode",
-	.data_size	= sizeof(struct afs_cache_vnode),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 4 },
-	.match		= afs_vnode_cache_match,
-	.update		= afs_vnode_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+				     uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+					void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+						    const void *buffer,
+						    uint16_t buflen);
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+					      struct address_space *mapping,
+					      struct pagevec *cached_pvec);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+struct fscache_cookie_def afs_vnode_cache_index_def = {
+	.name			= "AFS.vnode",
+	.type			= FSCACHE_COOKIE_TYPE_DATAFILE,
+	.get_key		= afs_vnode_cache_get_key,
+	.get_attr		= afs_vnode_cache_get_attr,
+	.get_aux		= afs_vnode_cache_get_aux,
+	.check_aux		= afs_vnode_cache_check_aux,
+	.mark_pages_cached	= afs_vnode_cache_mark_pages_cached,
+	.now_uncached		= afs_vnode_cache_now_uncached,
 };
 #endif
 
@@ -188,6 +201,8 @@ int afs_vnode_fetch_status(struct afs_vn
 
 	if (vnode->update_cnt > 0) {
 		/* someone else started a fetch */
+		_debug("conflict");
+
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		add_wait_queue(&vnode->update_waitq, &myself);
 
@@ -219,6 +234,7 @@ int afs_vnode_fetch_status(struct afs_vn
 		spin_unlock(&vnode->lock);
 		set_current_state(TASK_RUNNING);
 
+		_leave(" [conflicted, %d", !!(vnode->flags & AFS_VNODE_DELETED));
 		return vnode->flags & AFS_VNODE_DELETED ? -ENOENT : 0;
 	}
 
@@ -341,54 +357,200 @@ int afs_vnode_give_up_callback(struct af
 
 /*****************************************************************************/
 /*
- * match a vnode record stored in the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
-						 const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vnode *cvnode = entry;
-	struct afs_vnode *vnode = target;
+	const struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t klen;
 
-	_enter("{%x,%x,%Lx},{%x,%x,%Lx}",
-	       vnode->fid.vnode,
-	       vnode->fid.unique,
-	       vnode->status.version,
-	       cvnode->vnode_id,
-	       cvnode->vnode_unique,
-	       cvnode->data_version);
-
-	if (vnode->fid.vnode != cvnode->vnode_id) {
-		_leave(" = FAILED");
-		return CACHEFS_MATCH_FAILED;
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, bufmax);
+
+	klen = sizeof(vnode->fid.vnode);
+	if (klen > bufmax)
+		return 0;
+
+	memcpy(buffer, &vnode->fid.vnode, sizeof(vnode->fid.vnode));
+
+	_leave(" = %u", klen);
+	return klen;
+
+} /* end afs_vnode_cache_get_key() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide an updated file attributes
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+				     uint64_t *size)
+{
+	const struct afs_vnode *vnode = cookie_netfs_data;
+
+	_enter("{%x,%x,%Lx},",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+	*size = i_size_read((struct inode *) &vnode->vfs_inode);
+
+} /* end afs_vnode_cache_get_attr() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide new auxilliary cache data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
+{
+	const struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, bufmax);
+
+	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+	if (dlen > bufmax)
+		return 0;
+
+	memcpy(buffer, &vnode->fid.unique, sizeof(vnode->fid.unique));
+	buffer += sizeof(vnode->fid.unique);
+	memcpy(buffer, &vnode->status.version, sizeof(vnode->status.version));
+
+	_leave(" = %u", dlen);
+	return dlen;
+
+} /* end afs_vnode_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+						    const void *buffer,
+						    uint16_t buflen)
+{
+	struct afs_vnode *vnode = cookie_netfs_data;
+	uint16_t dlen;
+
+	_enter("{%x,%x,%Lx},%p,%u",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+	       buffer, buflen);
+
+	/* check the size of the data is what we're expecting */
+	dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+	if (dlen != buflen) {
+		_leave(" = OBSOLETE [len %hx != %hx]", dlen, buflen);
+		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
-	if (vnode->fid.unique != cvnode->vnode_unique ||
-	    vnode->status.version != cvnode->data_version) {
-		_leave(" = DELETE");
-		return CACHEFS_MATCH_SUCCESS_DELETE;
+	if (memcmp(buffer,
+		   &vnode->fid.unique,
+		   sizeof(vnode->fid.unique)
+		   ) != 0
+	    ) {
+		unsigned unique;
+
+		memcpy(&unique, buffer, sizeof(unique));
+
+		_leave(" = OBSOLETE [uniq %x != %x]",
+		       unique, vnode->fid.unique);
+		return FSCACHE_CHECKAUX_OBSOLETE;
+	}
+
+	if (memcmp(buffer + sizeof(vnode->fid.unique),
+		   &vnode->status.version,
+		   sizeof(vnode->status.version)
+		   ) != 0
+	    ) {
+		afs_dataversion_t version;
+
+		memcpy(&version, buffer + sizeof(vnode->fid.unique),
+		       sizeof(version));
+
+		_leave(" = OBSOLETE [vers %llx != %llx]",
+		       version, vnode->status.version);
+		return FSCACHE_CHECKAUX_OBSOLETE;
 	}
 
 	_leave(" = SUCCESS");
-	return CACHEFS_MATCH_SUCCESS;
-} /* end afs_vnode_cache_match() */
+	return FSCACHE_CHECKAUX_OKAY;
+
+} /* end afs_vnode_cache_check_aux() */
 #endif
 
 /*****************************************************************************/
 /*
- * update a vnode record stored in the cache
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
  */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vnode_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+					      struct address_space *mapping,
+					      struct pagevec *cached_pvec)
 {
-	struct afs_cache_vnode *cvnode = entry;
-	struct afs_vnode *vnode = source;
+	unsigned long loop;
+
+	for (loop = 0; loop < cached_pvec->nr; loop++) {
+		struct page *page = cached_pvec->pages[loop];
 
-	_enter("");
+		_debug("- mark %p{%lx}", page, page->index);
 
-	cvnode->vnode_id	= vnode->fid.vnode;
-	cvnode->vnode_unique	= vnode->fid.unique;
-	cvnode->data_version	= vnode->status.version;
+		SetPagePrivate(page);
+	}
+
+} /* end afs_vnode_cache_mark_pages_cached() */
+#endif
+
+/*****************************************************************************/
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ *   is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data)
+{
+	struct afs_vnode *vnode = cookie_netfs_data;
+	struct pagevec pvec;
+	pgoff_t first;
+	int loop, nr_pages;
+
+	_enter("{%x,%x,%Lx}",
+	       vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+	pagevec_init(&pvec, 0);
+	first = 0;
+
+	for (;;) {
+		/* grab a bunch of pages to clean */
+		nr_pages = pagevec_lookup(&pvec, vnode->vfs_inode.i_mapping,
+					  first,
+					  PAGEVEC_SIZE - pagevec_count(&pvec));
+		if (!nr_pages)
+			break;
+
+		for (loop = 0; loop < nr_pages; loop++)
+			ClearPagePrivate(pvec.pages[loop]);
+
+		first = pvec.pages[nr_pages - 1]->index + 1;
+
+		pvec.nr = nr_pages;
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+
+	_leave("");
 
-} /* end afs_vnode_cache_update() */
+} /* end afs_vnode_cache_now_uncached() */
 #endif
diff --git a/fs/afs/vnode.h b/fs/afs/vnode.h
index b86a971..3f0602d 100644
--- a/fs/afs/vnode.h
+++ b/fs/afs/vnode.h
@@ -13,9 +13,9 @@ #ifndef _LINUX_AFS_VNODE_H
 #define _LINUX_AFS_VNODE_H
 
 #include <linux/fs.h>
+#include <linux/fscache.h>
 #include "server.h"
 #include "kafstimod.h"
-#include "cache.h"
 
 #ifdef __KERNEL__
 
@@ -32,8 +32,8 @@ struct afs_cache_vnode
 	afs_dataversion_t	data_version;	/* data version */
 };
 
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vnode_cache_index_def;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_cookie_def afs_vnode_cache_index_def;
 #endif
 
 /*****************************************************************************/
@@ -47,8 +47,8 @@ struct afs_vnode
 	struct afs_volume	*volume;	/* volume on which vnode resides */
 	struct afs_fid		fid;		/* the file identifier for this inode */
 	struct afs_file_status	status;		/* AFS status info for this file */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 
 	wait_queue_head_t	update_waitq;	/* status fetch waitqueue */
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 0ff4b86..0bd5578 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -15,10 +15,10 @@ #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/pagemap.h>
+#include <linux/fscache.h>
 #include "volume.h"
 #include "vnode.h"
 #include "cell.h"
-#include "cache.h"
 #include "cmservice.h"
 #include "fsclient.h"
 #include "vlclient.h"
@@ -28,18 +28,14 @@ #ifdef __KDEBUG
 static const char *afs_voltypes[] = { "R/W", "R/O", "BAK" };
 #endif
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
-						  const void *entry);
-static void afs_volume_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_volume_cache_index_def = {
-	.name		= "volume",
-	.data_size	= sizeof(struct afs_cache_vhash),
-	.keys[0]	= { CACHEFS_INDEX_KEYS_BIN, 1 },
-	.keys[1]	= { CACHEFS_INDEX_KEYS_BIN, 1 },
-	.match		= afs_volume_cache_match,
-	.update		= afs_volume_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+					 void *buffer, uint16_t buflen);
+
+static struct fscache_cookie_def afs_volume_cache_index_def = {
+	.name		= "AFS.volume",
+	.type		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= afs_volume_cache_get_key,
 };
 #endif
 
@@ -214,11 +210,10 @@ int afs_volume_lookup(const char *name, 
 	}
 
 	/* attach the cache and volume location */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_acquire_cookie(vlocation->cache,
-			       &afs_vnode_cache_index_def,
-			       volume,
-			       &volume->cache);
+#ifdef CONFIG_AFS_FSCACHE
+	volume->cache = fscache_acquire_cookie(vlocation->cache,
+					       &afs_volume_cache_index_def,
+					       volume);
 #endif
 
 	afs_get_vlocation(vlocation);
@@ -286,8 +281,8 @@ void afs_put_volume(struct afs_volume *v
 	up_write(&vlocation->cell->vl_sem);
 
 	/* finish cleaning up the volume */
-#ifdef AFS_CACHING_SUPPORT
-	cachefs_relinquish_cookie(volume->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+	fscache_relinquish_cookie(volume->cache, 0);
 #endif
 	afs_put_vlocation(vlocation);
 
@@ -481,40 +476,25 @@ int afs_volume_release_fileserver(struct
 
 /*****************************************************************************/
 /*
- * match a volume hash record stored in the cache
+ * set the key for the index entry
  */
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
-						  const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+					void *buffer, uint16_t bufmax)
 {
-	const struct afs_cache_vhash *vhash = entry;
-	struct afs_volume *volume = target;
-
-	_enter("{%u},{%u}", volume->type, vhash->vtype);
+	const struct afs_volume *volume = cookie_netfs_data;
+	uint16_t klen;
 
-	if (volume->type == vhash->vtype) {
-		_leave(" = SUCCESS");
-		return CACHEFS_MATCH_SUCCESS;
-	}
-
-	_leave(" = FAILED");
-	return CACHEFS_MATCH_FAILED;
-} /* end afs_volume_cache_match() */
-#endif
+	_enter("{%u},%p,%u", volume->type, buffer, bufmax);
 
-/*****************************************************************************/
-/*
- * update a volume hash record stored in the cache
- */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_volume_cache_update(void *source, void *entry)
-{
-	struct afs_cache_vhash *vhash = entry;
-	struct afs_volume *volume = source;
+	klen = sizeof(volume->type);
+	if (klen > bufmax)
+		return 0;
 
-	_enter("");
+	memcpy(buffer, &volume->type, sizeof(volume->type));
 
-	vhash->vtype = volume->type;
+	_leave(" = %u", klen);
+	return klen;
 
-} /* end afs_volume_cache_update() */
+} /* end afs_volume_cache_get_key() */
 #endif
diff --git a/fs/afs/volume.h b/fs/afs/volume.h
index bfdcf19..fc9895a 100644
--- a/fs/afs/volume.h
+++ b/fs/afs/volume.h
@@ -12,11 +12,11 @@
 #ifndef _LINUX_AFS_VOLUME_H
 #define _LINUX_AFS_VOLUME_H
 
+#include <linux/fscache.h>
 #include "types.h"
 #include "fsclient.h"
 #include "kafstimod.h"
 #include "kafsasyncd.h"
-#include "cache.h"
 
 typedef enum {
 	AFS_VLUPD_SLEEP,		/* sleeping waiting for update timer to fire */
@@ -45,24 +45,6 @@ #define AFS_VOL_VTM_BAK	0x04 /* backup v
 	time_t			rtime;		/* last retrieval time */
 };
 
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vlocation_cache_index_def;
-#endif
-
-/*****************************************************************************/
-/*
- * volume -> vnode hash table entry
- */
-struct afs_cache_vhash
-{
-	afs_voltype_t		vtype;		/* which volume variation */
-	uint8_t			hash_bucket;	/* which hash bucket this represents */
-} __attribute__((packed));
-
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_volume_cache_index_def;
-#endif
-
 /*****************************************************************************/
 /*
  * AFS volume location record
@@ -73,8 +55,8 @@ struct afs_vlocation
 	struct list_head	link;		/* link in cell volume location list */
 	struct afs_timer	timeout;	/* decaching timer */
 	struct afs_cell		*cell;		/* cell to which volume belongs */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 	struct afs_cache_vlocation vldb;	/* volume information DB record */
 	struct afs_volume	*vols[3];	/* volume access record pointer (index by type) */
@@ -109,8 +91,8 @@ struct afs_volume
 	atomic_t		usage;
 	struct afs_cell		*cell;		/* cell to which belongs (unrefd ptr) */
 	struct afs_vlocation	*vlocation;	/* volume location */
-#ifdef AFS_CACHING_SUPPORT
-	struct cachefs_cookie	*cache;		/* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+	struct fscache_cookie	*cache;		/* caching cookie */
 #endif
 	afs_volid_t		vid;		/* volume ID */
 	afs_voltype_t		type;		/* type of volume */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] NFS: Use local caching [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
                   ` (3 preceding siblings ...)
  2006-08-18 15:35 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:43   ` Chuck Lever
  2006-08-18 15:51   ` David Howells
  2006-08-18 15:35 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
  2006-08-18 15:35 ` [PATCH 7/7] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem " David Howells
  6 siblings, 2 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

	http://people.redhat.com/steved/cachefs/util-linux/

To mount an NFS filesystem to use caching, add an "fsc" option to the mount:

	mount warthog:/ /a -o fsc

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/Kconfig                 |    7 +
 fs/nfs/Makefile            |    1 
 fs/nfs/client.c            |   11 +
 fs/nfs/file.c              |   49 ++++-
 fs/nfs/fscache.c           |  348 ++++++++++++++++++++++++++++++++
 fs/nfs/fscache.h           |  476 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/inode.c             |   21 ++
 fs/nfs/internal.h          |   32 +++
 fs/nfs/pagelist.c          |    3 
 fs/nfs/read.c              |   30 +++
 fs/nfs/super.c             |    1 
 fs/nfs/sysctl.c            |   43 ++++
 fs/nfs/write.c             |   11 +
 include/linux/nfs4_mount.h |    1 
 include/linux/nfs_fs.h     |    5 
 include/linux/nfs_fs_sb.h  |    5 
 include/linux/nfs_mount.h  |    1 
 17 files changed, 1035 insertions(+), 10 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index eecc0ed..36d0051 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1485,6 +1485,13 @@ config NFS_V4
 
 	  If unsure, say N.
 
+config NFS_FSCACHE
+	bool "Provide NFS client caching support (EXPERIMENTAL)"
+	depends on NFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want NFS data to be cached locally on disc through
+	  the general filesystem cache manager
+
 config NFS_DIRECTIO
 	bool "Allow direct I/O on NFS files (EXPERIMENTAL)"
 	depends on NFS_FS && EXPERIMENTAL
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index f4580b4..2af6f22 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4x
 			   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
 nfs-objs		:= $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 471d975..dd4ff23 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -150,6 +150,8 @@ #ifdef CONFIG_NFS_V4
 	clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+	nfs_fscache_get_client_cookie(clp);
+
 	return clp;
 
 error_3:
@@ -187,6 +189,8 @@ #ifdef CONFIG_NFS_V4
 	}
 #endif
 
+	nfs_fscache_release_client_cookie(clp);
+
 	/* -EIO all pending I/O */
 	if (!IS_ERR(clp->cl_rpcclient))
 		rpc_shutdown_client(clp->cl_rpcclient);
@@ -1363,7 +1367,7 @@ static int nfs_volume_list_show(struct s
 
 	/* display header on line 1 */
 	if (v == SEQ_START_TOKEN) {
-		seq_puts(m, "NV SERVER   PORT DEV     FSID\n");
+		seq_puts(m, "NV SERVER   PORT DEV     FSID              FSC\n");
 		return 0;
 	}
 	/* display one transport per line on subsequent lines */
@@ -1376,12 +1380,13 @@ static int nfs_volume_list_show(struct s
 	snprintf(fsid, 17, "%llx:%llx",
 		 server->fsid.major, server->fsid.minor);
 
-	seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+	seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
 		   clp->cl_nfsversion,
 		   NIPQUAD(clp->cl_addr.sin_addr),
 		   ntohs(clp->cl_addr.sin_port),
 		   dev,
-		   fsid);
+		   fsid,
+		   nfs_server_fscache_state(server));
 
 	return 0;
 }
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a146ed3..b7ab97c 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -27,12 +27,14 @@ #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
 #include <linux/smp_lock.h>
+#include <linux/buffer_head.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
 
 #include "delegation.h"
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_FILE
 
@@ -249,6 +251,10 @@ nfs_file_mmap(struct file * file, struct
 	status = nfs_revalidate_mapping(inode, file->f_mapping);
 	if (!status)
 		status = generic_file_mmap(file, vma);
+
+	if (status == 0)
+		nfs_fscache_install_vm_ops(inode, vma);
+
 	return status;
 }
 
@@ -301,6 +307,12 @@ static int nfs_commit_write(struct file 
 	return status;
 }
 
+/*
+ * partially or wholly invalidate a page
+ * - release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
 	struct inode *inode = page->mapping->host;
@@ -308,19 +320,47 @@ static void nfs_invalidate_page(struct p
 	/* Cancel any unstarted writes on this page */
 	if (offset == 0)
 		nfs_sync_inode_wait(inode, page->index, 1, FLUSH_INVALIDATE);
+
+	nfs_fscache_invalidate_page(page, inode, offset);
+
+	/* we can do this here as the bits are only set with the page lock
+	 * held, and our caller is holding that */
+	if (!page->private)
+		ClearPagePrivate(page);
 }
 
+/*
+ * release the private state associated with a page, if the page isn't busy
+ * - caller holds page lock
+ * - return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
-	if (gfp & __GFP_FS)
-		return !nfs_wb_page(page->mapping->host, page);
-	else
+	if ((gfp & __GFP_FS) == 0) {
 		/*
 		 * Avoid deadlock on nfs_wait_on_request().
 		 */
 		return 0;
+	}
+
+	if (nfs_wb_page(page->mapping->host, page) < 0)
+		return 0;
+
+	if (nfs_fscache_release_page(page) < 0)
+		return 0;
+
+	/* PG_private may have been set due to either caching or writing */
+	BUG_ON(page->private != 0);
+	ClearPagePrivate(page);
+
+	return 1;
 }
 
+/*
+ * Since we use page->private for our own nefarious purposes when using
+ * fscache, we have to override extra address space ops to prevent fs/buffer.c
+ * from getting confused, even though we may not have asked its opinion
+ */
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -334,6 +374,9 @@ const struct address_space_operations nf
 #ifdef CONFIG_NFS_DIRECTIO
 	.direct_IO = nfs_direct_IO,
 #endif
+#ifdef CONFIG_NFS_FSCACHE
+	.sync_page	= block_sync_page,
+#endif
 };
 
 /* 
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
new file mode 100644
index 0000000..94d5e3a
--- /dev/null
+++ b/fs/nfs/fscache.c
@@ -0,0 +1,348 @@
+/* fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/config.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+#include <linux/in6.h>
+
+#include "internal.h"
+
+/*
+ * Sysctl variables
+ */
+atomic_t nfs_fscache_to_pages;
+atomic_t nfs_fscache_from_pages;
+atomic_t nfs_fscache_uncache_page;
+int nfs_fscache_from_error;
+int nfs_fscache_to_error;
+
+#define NFSDBG_FACILITY		NFSDBG_FSCACHE
+
+/* the auxiliary data in the cache (used for coherency management) */
+struct nfs_fh_auxdata {
+	struct timespec	i_mtime;
+	struct timespec	i_ctime;
+	loff_t		i_size;
+};
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+	.name			= "nfs",
+	.version		= 0,
+	.ops			= &nfs_cache_ops,
+};
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+	[0 ... 9]	= 0x00,
+	[10 ... 11]	= 0xff
+};
+
+struct nfs_server_key {
+	uint16_t nfsversion;
+	uint16_t port;
+	union {
+		struct {
+			uint8_t		ipv6wrapper[12];
+			struct in_addr	addr;
+		} ipv4_addr;
+		struct in6_addr ipv6_addr;
+	};
+};
+
+static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
+				   void *buffer, uint16_t bufmax)
+{
+	const struct nfs_client *clp = cookie_netfs_data;
+	struct nfs_server_key *key = buffer;
+	uint16_t len = 0;
+
+	key->nfsversion = clp->cl_nfsversion;
+
+	switch (clp->cl_addr.sin_family) {
+	case AF_INET:
+		key->port = clp->cl_addr.sin_port;
+
+		memcpy(&key->ipv4_addr.ipv6wrapper,
+		       &nfs_cache_ipv6_wrapper_for_ipv4,
+		       sizeof(key->ipv4_addr.ipv6wrapper));
+		memcpy(&key->ipv4_addr.addr,
+		       &clp->cl_addr.sin_addr,
+		       sizeof(key->ipv4_addr.addr));
+		len = sizeof(struct nfs_server_key);
+		break;
+
+	case AF_INET6:
+		key->port = clp->cl_addr.sin_port;
+
+		memcpy(&key->ipv6_addr,
+		       &clp->cl_addr.sin_addr,
+		       sizeof(key->ipv6_addr));
+		len = sizeof(struct nfs_server_key);
+		break;
+
+	default:
+		len = 0;
+		printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
+			clp->cl_addr.sin_family);
+		break;
+	}
+
+	return len;
+}
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+struct fscache_cookie_def nfs_cache_server_index_def = {
+	.name		= "NFS.servers",
+	.type 		= FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key	= nfs_server_get_key,
+};
+
+static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
+		void *buffer, uint16_t bufmax)
+{
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+	uint16_t nsize;
+
+	/* set the file handle */
+	nsize = nfsi->fh.size;
+	memcpy(buffer, nfsi->fh.data, nsize);
+	return nsize;
+}
+
+/*
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
+ */
+static void nfs_fh_mark_pages_cached(void *cookie_netfs_data,
+				     struct address_space *mapping,
+				     struct pagevec *cached_pvec)
+{
+	struct nfs_inode *nfsi = cookie_netfs_data;
+	unsigned long loop;
+
+	dprintk("NFS: nfs_fh_mark_pages_cached: nfs_inode 0x%p pages %ld\n",
+		nfsi, cached_pvec->nr);
+
+	BUG_ON(!nfsi->fscache);
+
+	for (loop = 0; loop < cached_pvec->nr; loop++)
+		SetPageNfsCached(cached_pvec->pages[loop]);
+}
+
+/*
+ * get an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ *   require a context
+ */
+static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
+{
+	get_nfs_open_context(context);
+}
+
+/*
+ * release an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ *   require a context
+ */
+static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
+{
+	if (context)
+		put_nfs_open_context(context);
+}
+
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ *   is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+static void nfs_fh_now_uncached(void *cookie_netfs_data)
+{
+	struct nfs_inode *nfsi = cookie_netfs_data;
+	struct pagevec pvec;
+	pgoff_t first;
+	int loop, nr_pages;
+
+	pagevec_init(&pvec, 0);
+	first = 0;
+
+	dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
+
+	for (;;) {
+		/* grab a bunch of pages to clean */
+		nr_pages = pagevec_lookup(&pvec,
+					  nfsi->vfs_inode.i_mapping,
+					  first,
+					  PAGEVEC_SIZE - pagevec_count(&pvec));
+		if (!nr_pages)
+			break;
+
+		for (loop = 0; loop < nr_pages; loop++)
+			ClearPageNfsCached(pvec.pages[loop]);
+
+		first = pvec.pages[nr_pages - 1]->index + 1;
+
+		pvec.nr = nr_pages;
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+}
+
+/*
+ * get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ *   presented
+ */
+static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
+{
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+
+	*size = nfsi->vfs_inode.i_size;
+}
+
+/*
+ * get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ *   presented
+ */
+static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
+			       void *buffer, uint16_t bufmax)
+{
+	struct nfs_fh_auxdata auxdata;
+	const struct nfs_inode *nfsi = cookie_netfs_data;
+
+	auxdata.i_size = nfsi->vfs_inode.i_size;
+	auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+	auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+	if (bufmax > sizeof(auxdata))
+		bufmax = sizeof(auxdata);
+
+	memcpy(buffer, &auxdata, bufmax);
+	return bufmax;
+}
+
+/*
+ * consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ *   presented, as is the auxilliary data
+ */
+static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
+					   const void *data, uint16_t datalen)
+{
+	struct nfs_fh_auxdata auxdata;
+	struct nfs_inode *nfsi = cookie_netfs_data;
+
+	if (datalen > sizeof(auxdata))
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	auxdata.i_size = nfsi->vfs_inode.i_size;
+	auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+	auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+	if (memcmp(data, &auxdata, datalen) != 0)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+struct fscache_cookie_def nfs_cache_fh_index_def = {
+	.name			= "NFS.fh",
+	.type			= FSCACHE_COOKIE_TYPE_DATAFILE,
+	.get_key		= nfs_fh_get_key,
+	.get_attr		= nfs_fh_get_attr,
+	.get_aux		= nfs_fh_get_aux,
+	.check_aux		= nfs_fh_check_aux,
+	.get_context		= nfs_fh_get_context,
+	.put_context		= nfs_fh_put_context,
+	.mark_pages_cached	= nfs_fh_mark_pages_cached,
+	.now_uncached		= nfs_fh_now_uncached,
+};
+
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+	.nopage		= filemap_nopage,
+	.populate	= filemap_populate,
+	.page_mkwrite	= nfs_file_page_mkwrite,
+};
+
+/*
+ * handle completion of a page being stored in the cache
+ */
+void nfs_readpage_to_fscache_complete(struct page *page, void *data, int error)
+{
+	dfprintk(FSCACHE,
+		"NFS:     readpage_to_fscache_complete (p:%p(i:%lx f:%lx)/%d)\n",
+		page, page->index, page->flags, error);
+
+	end_page_fs_misc(page);
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - called in process (keventd) context
+ */
+void nfs_readpage_from_fscache_complete(struct page *page,
+					void *context,
+					int error)
+{
+	dfprintk(FSCACHE,
+		 "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
+		 page, context, error);
+
+	/* if the read completes with an error, we just unlock the page and let
+	 * the VM reissue the readpage */
+	if (!error) {
+		SetPageUptodate(page);
+		unlock_page(page);
+	} else {
+		error = nfs_readpage_async(context, page->mapping->host, page);
+		if (error)
+			unlock_page(page);
+	}
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - really need to synchronise the end of writeback, probably using a page
+ *   flag, but for the moment we disable caching on writable files
+ */
+void nfs_writepage_to_fscache_complete(struct page *page,
+				       void *data,
+				       int error)
+{
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
new file mode 100644
index 0000000..48a993a
--- /dev/null
+++ b/fs/nfs/fscache.h
@@ -0,0 +1,476 @@
+/* fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#include <linux/fscache.h>
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_cookie_def nfs_cache_server_index_def;
+extern struct fscache_cookie_def nfs_cache_fh_index_def;
+extern struct vm_operations_struct nfs_fs_vm_operations;
+
+extern void nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, gfp_t);
+
+extern atomic_t nfs_fscache_to_pages;
+extern atomic_t nfs_fscache_from_pages;
+extern atomic_t nfs_fscache_uncache_page;
+extern int nfs_fscache_from_error;
+extern int nfs_fscache_to_error;
+
+/*
+ * register NFS for caching
+ */
+static inline int nfs_fscache_register(void)
+{
+	return fscache_register_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * unregister NFS for caching
+ */
+static inline void nfs_fscache_unregister(void)
+{
+	fscache_unregister_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * get the per-client index cookie for an NFS client if the appropriate mount
+ * flag was set
+ * - we always try and get an index cookie for the client, but get filehandle
+ *   cookies on a per-superblock basis, depending on the mount flags
+ */
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
+{
+	/* create a cache index for looking up filehandles */
+	clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+					      &nfs_cache_server_index_def,
+					      clp);
+	dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
+		 clp, clp->fscache);
+}
+
+/*
+ * dispose of a per-client cookie
+ */
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
+{
+	dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
+		clp, clp->fscache);
+
+	fscache_relinquish_cookie(clp->fscache, 0);
+	clp->fscache = NULL;
+}
+
+/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+	if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
+		return "yes";
+	return "no ";
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_get_fh_cookie(struct super_block *sb,
+					     struct nfs_inode *nfsi,
+					     int maycache)
+{
+	nfsi->fscache = NULL;
+	if (maycache && (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
+		nfsi->fscache = fscache_acquire_cookie(
+			NFS_SB(sb)->nfs_client->fscache,
+			&nfs_cache_fh_index_def,
+			nfsi);
+
+		fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+		dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
+			 sb, nfsi, nfsi->fscache);
+	}
+}
+
+/*
+ * change the filesize associated with a per-filehandle cookie
+ */
+static inline void nfs_fscache_set_size(struct nfs_server *server,
+					struct nfs_inode *nfsi,
+					loff_t i_size)
+{
+	fscache_set_i_size(nfsi->fscache, i_size);
+}
+
+/*
+ * replace a per-filehandle cookie due to revalidation detecting a file having
+ * changed on the server
+ */
+static inline void nfs_fscache_renew_fh_cookie(struct nfs_server *server,
+					       struct nfs_inode *nfsi)
+{
+	struct fscache_cookie *old = nfsi->fscache;
+
+	if (nfsi->fscache) {
+		/* retire the current fscache cache and get a new one */
+		fscache_relinquish_cookie(nfsi->fscache, 1);
+
+		nfsi->fscache = fscache_acquire_cookie(
+			server->nfs_client->fscache,
+			&nfs_cache_fh_index_def,
+			nfsi);
+		fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+		dfprintk(FSCACHE,
+			 "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+			 server, nfsi, old, nfsi->fscache);
+	}
+}
+
+/*
+ * release a per-filehandle cookie
+ */
+static inline void nfs_fscache_release_fh_cookie(struct nfs_server *server,
+						 struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+		 nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 0);
+	nfsi->fscache = NULL;
+}
+
+/*
+ * retire a per-filehandle cookie, destroying the data attached to it
+ */
+static inline void nfs_fscache_zap_fh_cookie(struct nfs_server *server,
+					     struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+		nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = NULL;
+}
+
+/*
+ * turn off the cache with regard to a filehandle cookie if opened for writing,
+ * invalidating all the pages in the page cache relating to the associated
+ * inode to clear the per-page caching
+ */
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
+{
+	if (NFS_I(inode)->fscache) {
+		dfprintk(FSCACHE,
+			 "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
+
+		/* Need to invalided any mapped pages that were read in before
+		 * turning off the cache.
+		 */
+		if (inode->i_mapping && inode->i_mapping->nrpages)
+			invalidate_inode_pages2(inode->i_mapping);
+
+		nfs_fscache_zap_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
+	}
+}
+
+/*
+ * install the VM ops for mmap() of an NFS file so that we can hold up writes
+ * to pages on shared writable mappings until the store to the cache is
+ * complete
+ */
+static inline void nfs_fscache_install_vm_ops(struct inode *inode,
+					      struct vm_area_struct *vma)
+{
+	if (NFS_I(inode)->fscache)
+		vma->vm_ops = &nfs_fs_vm_operations;
+}
+
+/*
+ * release the caching state associated with a page, if the page isn't busy
+ * interacting with the cache
+ */
+static inline int nfs_fscache_release_page(struct page *page)
+{
+	if (PageFsMisc(page))
+		return -EBUSY;
+
+	if (PageNfsCached(page)) {
+		struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+		BUG_ON(!nfsi->fscache);
+
+		dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+			 nfsi->fscache, page, nfsi);
+
+		fscache_uncache_page(nfsi->fscache, page);
+		atomic_inc(&nfs_fscache_uncache_page);
+		ClearPageNfsCached(page);
+	}
+
+	return 0;
+}
+
+/*
+ * release the caching state associated with a page if undergoing complete page
+ * invalidation
+ */
+static inline void nfs_fscache_invalidate_page(struct page *page,
+					       struct inode *inode,
+					       unsigned long offset)
+{
+	struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+	if (PageNfsCached(page)) {
+		BUG_ON(!nfsi->fscache);
+
+		dfprintk(FSCACHE,
+			 "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+			 nfsi->fscache, page, nfsi);
+
+		wait_on_page_fs_misc(page);
+
+		if (offset == 0) {
+			BUG_ON(!PageLocked(page));
+			if (!PageWriteback(page)) {
+				fscache_uncache_page(nfsi->fscache, page);
+				atomic_inc(&nfs_fscache_uncache_page);
+				ClearPageNfsCached(page);
+			}
+		}
+	}
+}
+
+/*
+ * store a newly fetched page in fscache
+ */
+extern void nfs_readpage_to_fscache_complete(struct page *, void *, int);
+
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+					   struct page *page,
+					   int sync)
+{
+	int ret;
+
+	if (PageNfsCached(page)) {
+		dfprintk(FSCACHE,
+			 "NFS: "
+			 "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
+			 NFS_I(inode)->fscache, page, page->index, page->flags,
+			 sync);
+
+		if (TestSetPageFsMisc(page))
+			BUG();
+
+		ret = fscache_write_page(NFS_I(inode)->fscache, page,
+					 nfs_readpage_to_fscache_complete,
+					 NULL, GFP_KERNEL);
+		dfprintk(FSCACHE,
+			 "NFS:     "
+			 "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
+			 page, page->index, page->flags, ret);
+
+		if (ret != 0) {
+			fscache_uncache_page(NFS_I(inode)->fscache, page);
+			atomic_inc(&nfs_fscache_uncache_page);
+			ClearPageNfsCached(page);
+			end_page_fs_misc(page);
+			nfs_fscache_to_error = ret;
+		} else {
+			atomic_inc(&nfs_fscache_to_pages);
+		}
+	}
+}
+
+/*
+ * retrieve a page from fscache
+ */
+extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
+
+static inline
+int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+			      struct inode *inode,
+			      struct page *page)
+{
+	int ret;
+
+	if (!NFS_I(inode)->fscache)
+		return 1;
+
+	dfprintk(FSCACHE,
+		 "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
+		 NFS_I(inode)->fscache, page, page->index, page->flags, inode);
+
+	ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+					 page,
+					 nfs_readpage_from_fscache_complete,
+					 ctx,
+					 GFP_KERNEL);
+
+	switch (ret) {
+	case 0: /* read BIO submitted (page in fscache) */
+		dfprintk(FSCACHE,
+			 "NFS:    readpage_from_fscache: BIO submitted\n");
+		atomic_inc(&nfs_fscache_from_pages);
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dfprintk(FSCACHE,
+			 "NFS:    readpage_from_fscache error %d\n", ret);
+		return 1;
+
+	default:
+		dfprintk(FSCACHE, "NFS:    readpage_from_fscache %d\n", ret);
+		nfs_fscache_from_error = ret;
+	}
+	return ret;
+}
+
+/*
+ * retrieve a set of pages from fscache
+ */
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+					     struct inode *inode,
+					     struct address_space *mapping,
+					     struct list_head *pages,
+					     unsigned *nr_pages)
+{
+	int ret, npages = *nr_pages;
+
+	if (!NFS_I(inode)->fscache)
+		return 1;
+
+	dfprintk(FSCACHE,
+		 "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
+		 NFS_I(inode)->fscache, *nr_pages, inode);
+
+	ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
+					  mapping, pages, nr_pages,
+					  nfs_readpage_from_fscache_complete,
+					  ctx,
+					  mapping_gfp_mask(mapping));
+
+
+	switch (ret) {
+	case 0: /* read BIO submitted (page in fscache) */
+		BUG_ON(!list_empty(pages));
+		BUG_ON(*nr_pages != 0);
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: BIO submitted\n");
+
+		atomic_add(npages, &nfs_fscache_from_pages);
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
+		return 1;
+
+	default:
+		dfprintk(FSCACHE,
+			 "NFS: nfs_getpages_from_fscache: ret  %d\n", ret);
+		nfs_fscache_from_error = ret;
+	}
+
+	return ret;
+}
+
+/*
+ * store an updated page in fscache
+ */
+extern void nfs_writepage_to_fscache_complete(struct page *page, void *data, int error);
+
+static inline void nfs_writepage_to_fscache(struct inode *inode,
+					    struct page *page)
+{
+	int error;
+
+	if (PageNfsCached(page) && NFS_I(inode)->fscache) {
+		dfprintk(FSCACHE,
+			 "NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
+			 NFS_I(inode)->fscache, page, inode);
+
+		error = fscache_write_page(NFS_I(inode)->fscache, page,
+					   nfs_writepage_to_fscache_complete,
+					   NULL, GFP_KERNEL);
+		if (error != 0) {
+			dfprintk(FSCACHE,
+				 "NFS:    fscache_write_page error %d\n",
+				 error);
+			fscache_uncache_page(NFS_I(inode)->fscache, page);
+		}
+	}
+}
+
+#else /* CONFIG_NFS_FSCACHE */
+static inline int nfs_fscache_register(void) { return 0; }
+static inline void nfs_fscache_unregister(void) {}
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
+
+static inline void nfs_fscache_get_fh_cookie(struct super_block *sb,
+					     struct nfs_inode *nfsi,
+					     int maycache)
+{
+}
+static inline void nfs_fscache_set_size(struct nfs_server *server,
+					struct nfs_inode *nfsi,
+					loff_t i_size)
+{
+}
+static inline void nfs_fscache_release_fh_cookie(struct nfs_server *server,
+						 struct nfs_inode *nfsi)
+{
+}
+static inline void nfs_fscache_zap_fh_cookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline void nfs_fscache_renew_fh_cookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
+static inline void nfs_fscache_release_page(struct page *page) {}
+static inline void nfs_fscache_invalidate_page(struct page *page,
+					       struct inode *inode,
+					       unsigned long offset)
+{
+}
+static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
+static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+					    struct inode *inode, struct page *page)
+{
+	return -ENOBUFS;
+}
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+					     struct inode *inode,
+					     struct address_space *mapping,
+					     struct list_head *pages,
+					     unsigned *nr_pages)
+{
+	return -ENOBUFS;
+}
+
+static inline void nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	BUG_ON(PageNfsCached(page));
+}
+
+#endif /* CONFIG_NFS_FSCACHE */
+#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c7d34b7..a7b1e20 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -78,6 +78,7 @@ void nfs_clear_inode(struct inode *inode
 	BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
 	nfs_zap_acl_cache(inode);
 	nfs_access_zap_cache(inode);
+	nfs_fscache_release_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
 }
 
 /**
@@ -123,6 +124,8 @@ void nfs_zap_caches(struct inode *inode)
 	spin_lock(&inode->i_lock);
 	nfs_zap_caches_locked(inode);
 	spin_unlock(&inode->i_lock);
+
+	nfs_fscache_zap_fh_cookie(NFS_SERVER(inode), NFS_I(inode));
 }
 
 static void nfs_zap_acl_cache(struct inode *inode)
@@ -201,6 +204,7 @@ nfs_fhget(struct super_block *sb, struct
 	};
 	struct inode *inode = ERR_PTR(-ENOENT);
 	unsigned long hash;
+	int maycache = 1;
 
 	if ((fattr->valid & NFS_ATTR_FATTR) == 0)
 		goto out_no_inode;
@@ -252,6 +256,7 @@ nfs_fhget(struct super_block *sb, struct
 				else
 					inode->i_op = &nfs_mountpoint_inode_operations;
 				inode->i_fop = NULL;
+				maycache = 0;
 			}
 		} else if (S_ISLNK(inode->i_mode))
 			inode->i_op = &nfs_symlink_inode_operations;
@@ -284,6 +289,8 @@ nfs_fhget(struct super_block *sb, struct
 		memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
 		nfsi->access_cache = RB_ROOT;
 
+		nfs_fscache_get_fh_cookie(sb, nfsi, maycache);
+
 		unlock_new_inode(inode);
 	} else
 		nfs_refresh_inode(inode, fattr);
@@ -366,6 +373,7 @@ void nfs_setattr_update_inode(struct ino
 	if ((attr->ia_valid & ATTR_SIZE) != 0) {
 		nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
 		inode->i_size = attr->ia_size;
+		nfs_fscache_set_size(NFS_SERVER(inode), NFS_I(inode), inode->i_size);
 		vmtruncate(inode, attr->ia_size);
 	}
 }
@@ -550,6 +558,8 @@ int nfs_open(struct inode *inode, struct
 	ctx->mode = filp->f_mode;
 	nfs_file_set_open_context(filp, ctx);
 	put_nfs_open_context(ctx);
+	if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+		nfs_fscache_disable_fh_cookie(inode);
 	return 0;
 }
 
@@ -688,6 +698,8 @@ int nfs_revalidate_mapping(struct inode 
 		}
 		spin_unlock(&inode->i_lock);
 
+		nfs_fscache_renew_fh_cookie(NFS_SERVER(inode), nfsi);
+
 		dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
 				inode->i_sb->s_id,
 				(long long)NFS_FILEID(inode));
@@ -921,11 +933,13 @@ static int nfs_update_inode(struct inode
 			if (data_stable) {
 				inode->i_size = new_isize;
 				invalid |= NFS_INO_INVALID_DATA;
+				nfs_fscache_set_size(NFS_SERVER(inode), nfsi, inode->i_size);
 			}
 			invalid |= NFS_INO_INVALID_ATTR;
 		} else if (new_isize > cur_isize) {
 			inode->i_size = new_isize;
 			invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+			nfs_fscache_set_size(NFS_SERVER(inode), nfsi, inode->i_size);
 		}
 		nfsi->cache_change_attribute = jiffies;
 		dprintk("NFS: isize change on server for file %s/%ld\n",
@@ -1140,6 +1154,10 @@ static int __init init_nfs_fs(void)
 {
 	int err;
 
+	err = nfs_fscache_register();
+	if (err < 0)
+		goto out6;
+
 	err = nfs_fs_proc_init();
 	if (err)
 		goto out5;
@@ -1186,6 +1204,8 @@ out3:
 out4:
 	nfs_fs_proc_exit();
 out5:
+	nfs_fscache_unregister();
+out6:
 	return err;
 }
 
@@ -1196,6 +1216,7 @@ static void __exit exit_nfs_fs(void)
 	nfs_destroy_readpagecache();
 	nfs_destroy_inodecache();
 	nfs_destroy_nfspagecache();
+	nfs_fscache_unregister();
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index bea0b01..2fce9bd 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -4,6 +4,30 @@
 
 #include <linux/mount.h>
 
+#define NFS_PAGE_WRITING	0
+#define NFS_PAGE_CACHED		1
+
+#define PageNfsBit(bit, page)		test_bit(bit, &(page)->private)
+
+#define SetPageNfsBit(bit, page)		\
+do {						\
+	SetPagePrivate((page));			\
+	set_bit(bit, &(page)->private);		\
+} while(0)
+
+#define ClearPageNfsBit(bit, page)		\
+do {						\
+	clear_bit(bit, &(page)->private);	\
+} while(0)
+
+#define PageNfsWriting(page)		PageNfsBit(NFS_PAGE_WRITING, (page))
+#define SetPageNfsWriting(page)		SetPageNfsBit(NFS_PAGE_WRITING, (page))
+#define ClearPageNfsWriting(page)	ClearPageNfsBit(NFS_PAGE_WRITING, (page))
+
+#define PageNfsCached(page)		PageNfsBit(NFS_PAGE_CACHED, (page))
+#define SetPageNfsCached(page)		SetPageNfsBit(NFS_PAGE_CACHED, (page))
+#define ClearPageNfsCached(page)	ClearPageNfsBit(NFS_PAGE_CACHED, (page))
+
 struct nfs_string;
 struct nfs_mount_data;
 struct nfs4_mount_data;
@@ -27,6 +51,11 @@ struct nfs_clone_mount {
 	rpc_authflavor_t authflavor;
 };
 
+/*
+ * include filesystem caching stuff here
+ */
+#include "fscache.h"
+
 /* client.c */
 extern struct rpc_program nfs_program;
 
@@ -153,6 +182,9 @@ extern int nfs4_path_walk(struct nfs_ser
 			  const char *path);
 #endif
 
+/* read.c */
+extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
+
 /*
  * Determine the device name as a string
  */
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 36e902a..c45f724 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -17,6 +17,7 @@ #include <linux/nfs4.h>
 #include <linux/nfs_page.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_mount.h>
+#include "internal.h"
 
 #define NFS_PARANOIA 1
 
@@ -84,7 +85,7 @@ nfs_create_request(struct nfs_open_conte
 	atomic_set(&req->wb_complete, 0);
 	req->wb_index	= page->index;
 	page_cache_get(page);
-	BUG_ON(PagePrivate(page));
+	BUG_ON(PageNfsWriting(page));
 	BUG_ON(!PageLocked(page));
 	BUG_ON(page->mapping->host != inode);
 	req->wb_offset  = offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 3576650..f5531f0 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -26,11 +26,13 @@ #include <linux/pagemap.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
 #include <linux/smp_lock.h>
 
 #include <asm/system.h>
 
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -200,13 +202,18 @@ static int nfs_readpage_sync(struct nfs_
 		SetPageUptodate(page);
 	result = 0;
 
+	nfs_readpage_to_fscache(inode, page, 1);
+	unlock_page(page);
+
+	return result;
+
 io_error:
 	unlock_page(page);
 	nfs_readdata_free(rdata);
 	return result;
 }
 
-static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
+int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 		struct page *page)
 {
 	LIST_HEAD(one_request);
@@ -231,6 +238,11 @@ static int nfs_readpage_async(struct nfs
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
+	struct inode *d_inode = req->wb_context->dentry->d_inode;
+
+	if (PageUptodate(req->wb_page))
+		nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+
 	unlock_page(req->wb_page);
 
 	dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
@@ -613,6 +625,10 @@ int nfs_readpage(struct file *file, stru
 		ctx = get_nfs_open_context((struct nfs_open_context *)
 				file->private_data);
 	if (!IS_SYNC(inode)) {
+		error = nfs_readpage_from_fscache(ctx, inode, page);
+		if (error == 0)
+			goto out;
+
 		error = nfs_readpage_async(ctx, inode, page);
 		goto out;
 	}
@@ -643,6 +659,7 @@ readpage_async_filler(void *data, struct
 	unsigned int len;
 
 	nfs_wb_page(inode, page);
+
 	len = nfs_page_length(inode, page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
@@ -682,6 +699,17 @@ int nfs_readpages(struct file *filp, str
 	} else
 		desc.ctx = get_nfs_open_context((struct nfs_open_context *)
 				filp->private_data);
+
+	/* attempt to read as many of the pages as possible from the cache
+	 * - this returns -ENOBUFS immediately if the cookie is negative
+	 */
+	ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
+					 pages, &nr_pages);
+	if (ret == 0) {
+		put_nfs_open_context(desc.ctx);
+		return ret; /* all read */
+	}
+
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 	if (!list_empty(&head)) {
 		int err = nfs_pagein_list(&head, server->rpages);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 97cfb14..f7e6b2b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -291,6 +291,7 @@ static void nfs_show_mount_options(struc
 		{ NFS_MOUNT_NOAC, ",noac", "" },
 		{ NFS_MOUNT_NONLM, ",nolock", "" },
 		{ NFS_MOUNT_NOACL, ",noacl", "" },
+		{ NFS_MOUNT_FSCACHE, ",fsc", "" },
 		{ 0, NULL, NULL }
 	};
 	const struct proc_nfs_info *nfs_infop;
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 2fe3403..7a25a6d 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -14,6 +14,7 @@ #include <linux/nfs_idmap.h>
 #include <linux/nfs_fs.h>
 
 #include "callback.h"
+#include "internal.h"
 
 static const int nfs_set_port_min = 0;
 static const int nfs_set_port_max = 65535;
@@ -55,6 +56,48 @@ #endif
 		.proc_handler	= &proc_dointvec_jiffies,
 		.strategy	= &sysctl_jiffies,
 	},
+#ifdef CONFIG_NFS_FSCACHE
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_from_error",
+		.data = &nfs_fscache_from_error,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_to_error",
+		.data = &nfs_fscache_to_error,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_uncache_page",
+		.data = &nfs_fscache_uncache_page,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_to_pages",
+		.data = &nfs_fscache_to_pages,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec_minmax,
+	},
+	{
+		.ctl_name = CTL_UNNUMBERED,
+		.procname = "fscache_from_pages",
+		.data = &nfs_fscache_from_pages,
+		.maxlen = sizeof(int),
+		.mode = 0644,
+		.proc_handler = &proc_dointvec,
+	},
+#endif
 	{ .ctl_name = 0 }
 };
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 7bc500b..f654815 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -63,6 +63,7 @@ #include <linux/smp_lock.h>
 
 #include "delegation.h"
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_PAGECACHE
 
@@ -163,6 +164,9 @@ static void nfs_grow_file(struct page *p
 		return;
 	nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
 	i_size_write(inode, end);
+#ifdef FSCACHE_WRITE_SUPPORT
+	nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
+#endif
 }
 
 /* We can set the PG_uptodate flag if we see that a write request
@@ -342,6 +346,9 @@ do_it:
 		err = -EBADF;
 		goto out;
 	}
+
+	nfs_writepage_to_fscache(inode, page);
+
 	lock_kernel();
 	if (!IS_SYNC(inode) && inode_referenced) {
 		err = nfs_writepage_async(ctx, inode, page, 0, offset);
@@ -424,7 +431,7 @@ static int nfs_inode_add_request(struct 
 		if (nfs_have_delegation(inode, FMODE_WRITE))
 			nfsi->change_attr++;
 	}
-	SetPagePrivate(req->wb_page);
+	SetPageNfsWriting(req->wb_page);
 	nfsi->npages++;
 	atomic_inc(&req->wb_count);
 	return 0;
@@ -441,7 +448,7 @@ static void nfs_inode_remove_request(str
 	BUG_ON (!NFS_WBACK_BUSY(req));
 
 	spin_lock(&nfsi->req_lock);
-	ClearPagePrivate(req->wb_page);
+	ClearPageNfsWriting(req->wb_page);
 	radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
 	nfsi->npages--;
 	if (!nfsi->npages) {
diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
index 26b4c83..15199cc 100644
--- a/include/linux/nfs4_mount.h
+++ b/include/linux/nfs4_mount.h
@@ -65,6 +65,7 @@ #define NFS4_MOUNT_INTR		0x0002	/* 1 */
 #define NFS4_MOUNT_NOCTO	0x0010	/* 1 */
 #define NFS4_MOUNT_NOAC		0x0020	/* 1 */
 #define NFS4_MOUNT_STRICTLOCK	0x1000	/* 1 */
+#define NFS4_MOUNT_FSCACHE	0x4000	/* 1 */
 #define NFS4_MOUNT_FLAGMASK	0xFFFF
 
 #endif
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e7ad443..b768c43 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -30,6 +30,7 @@ #include <linux/nfs_fs_sb.h>
 
 #include <linux/rwsem.h>
 #include <linux/mempool.h>
+#include <linux/fscache.h>
 
 /*
  * Enable debugging support for nfs client.
@@ -185,6 +186,9 @@ #ifdef CONFIG_NFS_V4
 	int			 delegation_state;
 	struct rw_semaphore	rwsem;
 #endif /* CONFIG_NFS_V4*/
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;
+#endif
 	struct inode		vfs_inode;
 };
 
@@ -578,6 +582,7 @@ #define NFSDBG_FILE		0x0040
 #define NFSDBG_ROOT		0x0080
 #define NFSDBG_CALLBACK		0x0100
 #define NFSDBG_CLIENT		0x0200
+#define NFSDBG_FSCACHE		0x0400
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 6d0be0e..54221e0 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -3,6 +3,7 @@ #define _NFS_FS_SB
 
 #include <linux/list.h>
 #include <linux/backing-dev.h>
+#include <linux/fscache.h>
 
 struct nfs_iostats;
 
@@ -66,6 +67,10 @@ #ifdef CONFIG_NFS_V4
 	char			cl_ipaddr[16];
 	unsigned char		cl_id_uniquifier;
 #endif
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;	/* client index cache cookie */
+#endif
 };
 
 /*
diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
index 659c754..278bb4e 100644
--- a/include/linux/nfs_mount.h
+++ b/include/linux/nfs_mount.h
@@ -61,6 +61,7 @@ #define NFS_MOUNT_BROKEN_SUID	0x0400	/* 
 #define NFS_MOUNT_NOACL		0x0800	/* 4 */
 #define NFS_MOUNT_STRICTLOCK	0x1000	/* reserved for NFSv4 */
 #define NFS_MOUNT_SECFLAVOUR	0x2000	/* 5 */
+#define NFS_MOUNT_FSCACHE	0x4000
 #define NFS_MOUNT_FLAGMASK	0xFFFF
 
 #endif

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
                   ` (4 preceding siblings ...)
  2006-08-18 15:35 ` [PATCH 5/7] NFS: Use local caching " David Howells
@ 2006-08-18 15:35 ` David Howells
  2006-08-18 15:35 ` [PATCH 7/7] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem " David Howells
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
---

 arch/ia64/kernel/ia64_ksyms.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index 3ead20f..6746a3e 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -42,6 +42,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem [try #12]
  2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
                   ` (5 preceding siblings ...)
  2006-08-18 15:35 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
@ 2006-08-18 15:35 ` David Howells
  6 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:35 UTC (permalink / raw)
  To: torvalds, akpm, steved, trond.myklebust
  Cc: linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Add a cache backend that permits a mounted filesystem to be used as a backing
store for the cache.


CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling.  This is called cachefilesd and lives in
/sbin.  The source for the daemon can be downloaded from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.c

And an example configuration from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf

The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services.  Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.

CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for
communication with the daemon.  Only one thing may have this open at once, and
whilst it is open, a cache is at least partially in existence.  The daemon
opens this and sends commands down it to control the cache.

CacheFiles is currently limited to a single cache.

CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section.  This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.


============
REQUIREMENTS
============

The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:

	- dnotify.

	- extended attributes (xattrs).

	- openat() and friends.

	- bmap() support on files in the filesystem (FIBMAP ioctl).

	- The use of bmap() to detect a partial page at the end of the file.

It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.


=============
CONFIGURATION
=============

The cache is configured by a script in /etc/cachefilesd.conf.  These commands
set up cache ready for use.  The following script commands are available:

 (*) brun <N>%
 (*) bcull <N>%
 (*) bstop <N>%

	Configure the culling limits.  Optional.  See the section on culling
	The defaults are 7%, 5% and 1% respectively.

 (*) dir <path>

	Specify the directory containing the root of the cache.  Mandatory.

 (*) tag <name>

	Specify a tag to FS-Cache to use in distinguishing multiple caches.
	Optional.  The default is "CacheFiles".

 (*) debug <mask>

	Specify a numeric bitmask to control debugging in the kernel module.
	Optional.  The default is zero (all off).


==================
STARTING THE CACHE
==================

The cache is started by running the daemon.  The daemon opens the cache proc
file, configures the cache and tells it to begin caching.  At that point the
cache binds to fscache and the cache becomes live.

The daemon is run as follows:

	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]

The flags are:

 (*) -d

	Increase the debugging level.  This can be specified multiple times and
	is cumulative with itself.

 (*) -s

	Send messages to stderr instead of syslog.

 (*) -n

	Don't daemonise and go into background.

 (*) -f <configfile>

	Use an alternative configuration file rather than the default one.


===============
THINGS TO AVOID
===============

Do not mount other things within the cache as this will cause problems.  The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.

Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.

Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).

Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.

Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.

Do not chmod files in the cache.  The module creates things with minimal
permissions to prevent random users being able to access them directly.


=============
CACHE CULLING
=============

The cache may need culling occasionally to make space.  This involves
discarding objects from the cache that have been used less recently than
anything else.  Culling is based on the access time of data objects.  Empty
directories are culled if not in use.

Cache culling is done on the basis of the percentage of blocks available in the
underlying filesystem.  There are three "limits":

 (*) brun

     If the amount of available space in the cache rises above this limit, then
     culling is turned off.

 (*) bcull

     If the amount of available space in the cache falls below this limit, then
     culling is started.

 (*) bstop

     If the amount of available space in the cache falls below this limit, then
     no further allocation of disk space is permitted until culling has raised
     the amount above this limit again.

These must be configured thusly:

	0 <= bstop < bcull < brun < 100

Note that these are percentages of available space, and do _not_ appear as 100
minus the percentage displayed by the "df" program.

The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order.  A new scan of the cache is
started as soon as space is made in the table.  Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.


===============
CACHE STRUCTURE
===============

The CacheFiles module will create two directories in the directory it was
given:

 (*) cache/

 (*) graveyard/

The active cache objects all reside in the first directory.  The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.

The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.


The module represents index objects as directories with the filename "I..." or
"J...".  Note that the "cache/" directory is itself a special index.

Data objects are represented as files if they have no children, or directories
if they do.  Their filenames all begin "D..." or "E...".  If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.

Special objects are similar to data objects, except their filenames begin
"S..." or "T...".


If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended.  Into
this directory, if possible, will be placed the representations of the child
objects:

	INDEX     INDEX      INDEX                             DATA FILES
	========= ========== ================================= ================
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry


If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory.  The names of the intermediate directories will have
'+' prepended:

	J1223/@23/+xy...z/+kl...m/Epqr


Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.

To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable.  The two versions of
object filenames indicate the encoding:

	OBJECT TYPE	PRINTABLE	ENCODED
	===============	===============	===============
	Index		"I..."		"J..."
	Data		"D..."		"E..."
	Special		"S..."		"T..."

Intermediate directories are always "@" or "+" as appropriate.


Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs.  The latter is used to detect stale objects in the cache and update
or retire them.


Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).


This documentation is added by the patch to:

	Documentation/filesystems/caching/cachefiles.txt

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/caching/cachefiles.txt |  281 +++++
 fs/Kconfig                                       |   20 
 fs/Makefile                                      |    1 
 fs/buffer.c                                      |    2 
 fs/cachefiles/Makefile                           |   18 
 fs/cachefiles/cf-bind.c                          |  279 +++++
 fs/cachefiles/cf-interface.c                     | 1299 ++++++++++++++++++++++
 fs/cachefiles/cf-key.c                           |  157 +++
 fs/cachefiles/cf-main.c                          |  129 ++
 fs/cachefiles/cf-namei.c                         |  825 ++++++++++++++
 fs/cachefiles/cf-proc.c                          |  498 ++++++++
 fs/cachefiles/cf-sysctl.c                        |   69 +
 fs/cachefiles/cf-xattr.c                         |  295 +++++
 fs/cachefiles/internal.h                         |  308 +++++
 fs/fcntl.c                                       |    2 
 fs/file_table.c                                  |    1 
 include/linux/fs.h                               |    2 
 include/linux/pagemap.h                          |    6 
 kernel/auditsc.c                                 |    2 
 mm/filemap.c                                     |   98 ++
 20 files changed, 4292 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/caching/cachefiles.txt b/Documentation/filesystems/caching/cachefiles.txt
new file mode 100644
index 0000000..37b6385
--- /dev/null
+++ b/Documentation/filesystems/caching/cachefiles.txt
@@ -0,0 +1,281 @@
+	       ===============================================
+	       CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
+	       ===============================================
+
+Contents:
+
+ (*) Overview.
+
+ (*) Requirements.
+
+ (*) Configuration.
+
+ (*) Starting the cache.
+
+ (*) Things to avoid.
+
+
+========
+OVERVIEW
+========
+
+CacheFiles is a caching backend that's meant to use as a cache a directory on
+an already mounted filesystem of a local type (such as Ext3).
+
+CacheFiles uses a userspace daemon to do some of the cache management - such as
+reaping stale nodes and culling.  This is called cachefilesd and lives in
+/sbin.
+
+The filesystem and data integrity of the cache are only as good as those of the
+filesystem providing the backing services.  Note that CacheFiles does not
+attempt to journal anything since the journalling interfaces of the various
+filesystems are very specific in nature.
+
+CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for
+communication with the daemon.  Only one thing may have this open at once, and
+whilst it is open, a cache is at least partially in existence.  The daemon
+opens this and sends commands down it to control the cache.
+
+CacheFiles is currently limited to a single cache.
+
+CacheFiles attempts to maintain at least a certain percentage of free space on
+the filesystem, shrinking the cache by culling the objects it contains to make
+space if necessary - see the "Cache Culling" section.  This means it can be
+placed on the same medium as a live set of data, and will expand to make use of
+spare space and automatically contract when the set of data requires more
+space.
+
+
+============
+REQUIREMENTS
+============
+
+The use of CacheFiles and its daemon requires the following features to be
+available in the system and in the cache filesystem:
+
+	- dnotify.
+
+	- extended attributes (xattrs).
+
+	- openat() and friends.
+
+	- bmap() support on files in the filesystem (FIBMAP ioctl).
+
+	- The use of bmap() to detect a partial page at the end of the file.
+
+It is strongly recommended that the "dir_index" option is enabled on Ext3
+filesystems being used as a cache.
+
+
+=============
+CONFIGURATION
+=============
+
+The cache is configured by a script in /etc/cachefilesd.conf.  These commands
+set up cache ready for use.  The following script commands are available:
+
+ (*) brun <N>%
+ (*) bcull <N>%
+ (*) bstop <N>%
+
+	Configure the culling limits.  Optional.  See the section on culling
+	The defaults are 7%, 5% and 1% respectively.
+
+ (*) dir <path>
+
+	Specify the directory containing the root of the cache.  Mandatory.
+
+ (*) tag <name>
+
+	Specify a tag to FS-Cache to use in distinguishing multiple caches.
+	Optional.  The default is "CacheFiles".
+
+ (*) debug <mask>
+
+	Specify a numeric bitmask to control debugging in the kernel module.
+	Optional.  The default is zero (all off).  The following values can be
+	OR'd into the mask to collect various information:
+
+		1	Turn on trace of function entry (_enter() macros)
+		2	Turn on trace of function exit (_leave() macros)
+		4	Turn on trace of internal debug points (_debug())
+
+	This mask can also be set through /proc/sys/fs/cachefiles/debug.
+
+
+==================
+STARTING THE CACHE
+==================
+
+The cache is started by running the daemon.  The daemon opens the cache proc
+file, configures the cache and tells it to begin caching.  At that point the
+cache binds to fscache and the cache becomes live.
+
+The daemon is run as follows:
+
+	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
+
+The flags are:
+
+ (*) -d
+
+	Increase the debugging level.  This can be specified multiple times and
+	is cumulative with itself.
+
+ (*) -s
+
+	Send messages to stderr instead of syslog.
+
+ (*) -n
+
+	Don't daemonise and go into background.
+
+ (*) -f <configfile>
+
+	Use an alternative configuration file rather than the default one.
+
+
+===============
+THINGS TO AVOID
+===============
+
+Do not mount other things within the cache as this will cause problems.  The
+kernel module contains its own very cut-down path walking facility that ignores
+mountpoints, but the daemon can't avoid them.
+
+Do not create, rename or unlink files and directories in the cache whilst the
+cache is active, as this may cause the state to become uncertain.
+
+Renaming files in the cache might make objects appear to be other objects (the
+filename is part of the lookup key).
+
+Do not change or remove the extended attributes attached to cache files by the
+cache as this will cause the cache state management to get confused.
+
+Do not create files or directories in the cache, lest the cache get confused or
+serve incorrect data.
+
+Do not chmod files in the cache.  The module creates things with minimal
+permissions to prevent random users being able to access them directly.
+
+
+=============
+CACHE CULLING
+=============
+
+The cache may need culling occasionally to make space.  This involves
+discarding objects from the cache that have been used less recently than
+anything else.  Culling is based on the access time of data objects.  Empty
+directories are culled if not in use.
+
+Cache culling is done on the basis of the percentage of blocks available in the
+underlying filesystem.  There are three "limits":
+
+ (*) brun
+
+     If the amount of available space in the cache rises above this limit, then
+     culling is turned off.
+
+ (*) bcull
+
+     If the amount of available space in the cache falls below this limit, then
+     culling is started.
+
+ (*) bstop
+
+     If the amount of available space in the cache falls below this limit, then
+     no further allocation of disk space is permitted until culling has raised
+     the amount above this limit again.
+
+These must be configured thusly:
+
+	0 <= bstop < bcull < brun < 100
+
+Note that these are percentages of available space, and do _not_ appear as 100
+minus the percentage displayed by the "df" program.
+
+The userspace daemon scans the cache to build up a table of cullable objects.
+These are then culled in least recently used order.  A new scan of the cache is
+started as soon as space is made in the table.  Objects will be skipped if
+their atimes have changed or if the kernel module says it is still using them.
+
+
+===============
+CACHE STRUCTURE
+===============
+
+The CacheFiles module will create two directories in the directory it was
+given:
+
+ (*) cache/
+
+ (*) graveyard/
+
+The active cache objects all reside in the first directory.  The CacheFiles
+kernel module moves any retired or culled objects that it can't simply unlink
+to the graveyard from which the daemon will actually delete them.
+
+The daemon uses dnotify to monitor the graveyard directory, and will delete
+anything that appears therein.
+
+
+The module represents index objects as directories with the filename "I..." or
+"J...".  Note that the "cache/" directory is itself a special index.
+
+Data objects are represented as files if they have no children, or directories
+if they do.  Their filenames all begin "D..." or "E...".  If represented as a
+directory, data objects will have a file in the directory called "data" that
+actually holds the data.
+
+Special objects are similar to data objects, except their filenames begin
+"S..." or "T...".
+
+
+If an object has children, then it will be represented as a directory.
+Immediately in the representative directory are a collection of directories
+named for hash values of the child object keys with an '@' prepended.  Into
+this directory, if possible, will be placed the representations of the child
+objects:
+
+	INDEX     INDEX      INDEX                             DATA FILES
+	========= ========== ================================= ================
+	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
+	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
+	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
+	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
+
+
+If the key is so long that it exceeds NAME_MAX with the decorations added on to
+it, then it will be cut into pieces, the first few of which will be used to
+make a nest of directories, and the last one of which will be the objects
+inside the last directory.  The names of the intermediate directories will have
+'+' prepended:
+
+	J1223/@23/+xy...z/+kl...m/Epqr
+
+
+Note that keys are raw data, and not only may they exceed NAME_MAX in size,
+they may also contain things like '/' and NUL characters, and so they may not
+be suitable for turning directly into a filename.
+
+To handle this, CacheFiles will use a suitably printable filename directly and
+"base-64" encode ones that aren't directly suitable.  The two versions of
+object filenames indicate the encoding:
+
+	OBJECT TYPE	PRINTABLE	ENCODED
+	===============	===============	===============
+	Index		"I..."		"J..."
+	Data		"D..."		"E..."
+	Special		"S..."		"T..."
+
+Intermediate directories are always "@" or "+" as appropriate.
+
+
+Each object in the cache has an extended attribute label that holds the object
+type ID (required to distinguish special objects) and the auxiliary data from
+the netfs.  The latter is used to detect stale objects in the cache and update
+or retire them.
+
+
+Note that CacheFiles will erase from the cache any file it doesn't recognise or
+any file of an incorrect type (such as a FIFO file or a device file).
diff --git a/fs/Kconfig b/fs/Kconfig
index 36d0051..afec7e1 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -543,6 +543,26 @@ config FSCACHE
 
 	  See Documentation/filesystems/caching/fscache.txt for more information.
 
+config CACHEFILES
+	tristate "Filesystem caching on files"
+	depends on FSCACHE
+	help
+	  This permits use of a mounted filesystem as a cache for other
+	  filesystems - primarily networking filesystems - thus allowing fast
+	  local disk to enhance the speed of slower devices.
+
+	  See Documentation/filesystems/caching/cachefiles.txt for more
+	  information.
+
+config CACHEFILES_DEBUG
+	bool "Debug CacheFiles"
+	depends on CACHEFILES
+	help
+	  This permits debugging to be dynamically enabled in the filesystem
+	  caching on files module.  If this is set, the debugging output may be
+	  enabled by setting bits in /proc/sys/fs/cachefiles/debug or by
+	  including a debugging specifier in /etc/cachefilesd.conf.
+
 endmenu
 
 menu "CD-ROM/DVD Filesystems"
diff --git a/fs/Makefile b/fs/Makefile
index 0b0e827..e5efb4b 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -101,5 +101,6 @@ obj-$(CONFIG_AFS_FS)		+= afs/
 obj-$(CONFIG_BEFS_FS)		+= befs/
 obj-$(CONFIG_HOSTFS)		+= hostfs/
 obj-$(CONFIG_HPPFS)		+= hppfs/
+obj-$(CONFIG_CACHEFILES)	+= cachefiles/
 obj-$(CONFIG_DEBUG_FS)		+= debugfs/
 obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
diff --git a/fs/buffer.c b/fs/buffer.c
index 71649ef..0628445 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -184,6 +184,8 @@ int fsync_super(struct super_block *sb)
 	return sync_blockdev(sb->s_bdev);
 }
 
+EXPORT_SYMBOL(fsync_super);
+
 /*
  * Write out and wait upon all dirty data associated with this
  * device.   Filesystem data as well as the underlying block
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
new file mode 100644
index 0000000..c1522d9
--- /dev/null
+++ b/fs/cachefiles/Makefile
@@ -0,0 +1,18 @@
+#
+# Makefile for caching in a mounted filesystem
+#
+
+cachefiles-objs := \
+	cf-bind.o \
+	cf-interface.o \
+	cf-key.o \
+	cf-main.o \
+	cf-namei.o \
+	cf-proc.o \
+	cf-xattr.o
+
+ifeq ($(CONFIG_SYSCTL),y)
+cachefiles-objs += cf-sysctl.o
+endif
+
+obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
new file mode 100644
index 0000000..d8cb4c3
--- /dev/null
+++ b/fs/cachefiles/cf-bind.c
@@ -0,0 +1,279 @@
+/* cf-bind.c: bind and unbind a cache from the filesystem backing it
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/namespace.h>
+#include <linux/statfs.h>
+#include <linux/proc_fs.h>
+#include <linux/ctype.h>
+#include "internal.h"
+
+static int cachefiles_proc_add_cache(struct cachefiles_cache *cache,
+				     struct vfsmount *mnt);
+
+/*****************************************************************************/
+/*
+ * bind a directory as a cache
+ */
+int cachefiles_proc_bind(struct cachefiles_cache *cache, char *args)
+{
+	_enter("{%u,%u,%u},%s",
+	       cache->brun_percent,
+	       cache->bcull_percent,
+	       cache->bstop_percent,
+	       args);
+
+	/* start by checking things over */
+	ASSERT(cache->bstop_percent >= 0 &&
+	       cache->bstop_percent < cache->bcull_percent &&
+	       cache->bcull_percent < cache->brun_percent &&
+	       cache->brun_percent  < 100);
+
+	if (*args) {
+		kerror("'bind' command doesn't take an argument");
+		return -EINVAL;
+	}
+
+	if (!cache->rootdirname) {
+		kerror("No cache directory specified");
+		return -EINVAL;
+	}
+
+	/* don't permit already bound caches to be re-bound */
+	if (test_bit(CACHEFILES_READY, &cache->flags)) {
+		kerror("Cache already bound");
+		return -EBUSY;
+	}
+
+	/* make sure we have copies of the tag and dirname strings */
+	if (!cache->tag) {
+		/* the tag string is released by the fops->release()
+		 * function, so we don't release it on error here */
+		cache->tag = kstrdup("CacheFiles", GFP_KERNEL);
+		if (!cache->tag)
+			return -ENOMEM;
+	}
+
+	/* add the cache */
+	return cachefiles_proc_add_cache(cache, NULL);
+}
+
+/*****************************************************************************/
+/*
+ * add a cache
+ */
+static int cachefiles_proc_add_cache(struct cachefiles_cache *cache,
+				     struct vfsmount *mnt)
+{
+	struct cachefiles_object *fsdef;
+	struct nameidata nd;
+	struct kstatfs stats;
+	struct dentry *graveyard, *cachedir, *root;
+	int ret;
+
+	_enter("");
+
+	/* allocate the root index object */
+	ret = -ENOMEM;
+
+	fsdef = kmem_cache_alloc(cachefiles_object_jar, SLAB_KERNEL);
+	if (!fsdef)
+		goto error_root_object;
+
+	atomic_set(&fsdef->usage, 1);
+	atomic_set(&fsdef->fscache_usage, 1);
+	fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
+
+	_debug("- fsdef %p", fsdef);
+
+	/* look up the directory at the root of the cache */
+	memset(&nd, 0, sizeof(nd));
+
+	ret = path_lookup(cache->rootdirname, LOOKUP_DIRECTORY, &nd);
+	if (ret < 0)
+		goto error_open_root;
+
+	/* bind to the special mountpoint we've prepared */
+	if (mnt) {
+		atomic_inc(&nd.mnt->mnt_sb->s_active);
+		mnt->mnt_sb = nd.mnt->mnt_sb;
+		mnt->mnt_flags = nd.mnt->mnt_flags;
+		mnt->mnt_flags |= MNT_NOSUID | MNT_NOEXEC | MNT_NODEV;
+		mnt->mnt_root = dget(nd.dentry);
+		mnt->mnt_mountpoint = mnt->mnt_root;
+
+		/* copy the name, but ignore kstrdup() failing ENOMEM - we'll
+		 * just end up with an devicenameless mountpoint */
+		mnt->mnt_devname = kstrdup(nd.mnt->mnt_devname, GFP_KERNEL);
+		path_release(&nd);
+
+		cache->mnt = mntget(mnt);
+		root = dget(mnt->mnt_root);
+	} else {
+		cache->mnt = nd.mnt;
+		root = nd.dentry;
+
+		nd.mnt = NULL;
+		nd.dentry = NULL;
+		path_release(&nd);
+	}
+
+	/* check parameters */
+	ret = -EOPNOTSUPP;
+	if (!root->d_inode ||
+	    !root->d_inode->i_op ||
+	    !root->d_inode->i_op->lookup ||
+	    !root->d_inode->i_op->mkdir ||
+	    !root->d_inode->i_op->setxattr ||
+	    !root->d_inode->i_op->getxattr ||
+	    !root->d_sb ||
+	    !root->d_sb->s_op ||
+	    !root->d_sb->s_op->statfs ||
+	    !root->d_sb->s_op->sync_fs)
+		goto error_unsupported;
+
+	ret = -EROFS;
+	if (root->d_sb->s_flags & MS_RDONLY)
+		goto error_unsupported;
+
+	/* get the cache size and blocksize */
+	ret = root->d_sb->s_op->statfs(root, &stats);
+	if (ret < 0)
+		goto error_unsupported;
+
+	ret = -ERANGE;
+	if (stats.f_bsize <= 0)
+		goto error_unsupported;
+
+	ret = -EOPNOTSUPP;
+	if (stats.f_bsize > PAGE_SIZE)
+		goto error_unsupported;
+
+	cache->bsize = stats.f_bsize;
+	cache->bshift = 0;
+	if (stats.f_bsize < PAGE_SIZE)
+		cache->bshift = PAGE_SHIFT - long_log2(stats.f_bsize);
+
+	_debug("blksize %u (shift %u)",
+	       cache->bsize, cache->bshift);
+
+	_debug("size %llu, avail %llu", stats.f_blocks, stats.f_bavail);
+
+	/* set up caching limits */
+	stats.f_blocks >>= cache->bshift;
+	do_div(stats.f_blocks, 100);
+	cache->bstop = stats.f_blocks * cache->bstop_percent;
+	cache->bcull = stats.f_blocks * cache->bcull_percent;
+	cache->brun  = stats.f_blocks * cache->brun_percent;
+
+	_debug("limits {%llu,%llu,%llu}",
+	       cache->brun,
+	       cache->bcull,
+	       cache->bstop);
+
+	/* get the cache directory and check its type */
+	cachedir = cachefiles_get_directory(cache, root, "cache");
+	if (IS_ERR(cachedir)) {
+		ret = PTR_ERR(cachedir);
+		goto error_unsupported;
+	}
+
+	fsdef->dentry = cachedir;
+
+	ret = cachefiles_check_object_type(fsdef);
+	if (ret < 0)
+		goto error_unsupported;
+
+	/* get the graveyard directory */
+	graveyard = cachefiles_get_directory(cache, root, "graveyard");
+	if (IS_ERR(graveyard)) {
+		ret = PTR_ERR(graveyard);
+		goto error_unsupported;
+	}
+
+	cache->graveyard = graveyard;
+
+	/* publish the cache */
+	fscache_init_cache(&cache->cache,
+			   &cachefiles_cache_ops,
+			   "%02x:%02x",
+			   MAJOR(fsdef->dentry->d_sb->s_dev),
+			   MINOR(fsdef->dentry->d_sb->s_dev)
+			   );
+
+	ret = fscache_add_cache(&cache->cache, &fsdef->fscache, cache->tag);
+	if (ret < 0)
+		goto error_add_cache;
+
+	/* done */
+	set_bit(CACHEFILES_READY, &cache->flags);
+	dput(root);
+
+	printk(KERN_INFO "CacheFiles:"
+	       " File cache on %s registered\n",
+	       cache->cache.identifier);
+
+	/* check how much space the cache has */
+	cachefiles_has_space(cache, 0);
+
+	return 0;
+
+error_add_cache:
+	dput(cache->graveyard);
+	cache->graveyard = NULL;
+error_unsupported:
+	mntput(cache->mnt);
+	cache->mnt = NULL;
+	dput(fsdef->dentry);
+	fsdef->dentry = NULL;
+	dput(root);
+error_open_root:
+	kmem_cache_free(cachefiles_object_jar, fsdef);
+error_root_object:
+	kerror("Failed to register: %d", ret);
+	return ret;
+}
+
+/*****************************************************************************/
+/*
+ * unbind a cache on fd release
+ */
+void cachefiles_proc_unbind(struct cachefiles_cache *cache)
+{
+	_enter("");
+
+	if (test_bit(CACHEFILES_READY, &cache->flags)) {
+		printk(KERN_INFO "CacheFiles:"
+		       " File cache on %s unregistering\n",
+		       cache->cache.identifier);
+
+		fscache_withdraw_cache(&cache->cache);
+	}
+
+	if (cache->cache.fsdef)
+		cache->cache.ops->put_object(cache->cache.fsdef);
+
+	dput(cache->graveyard);
+	mntput(cache->mnt);
+
+	kfree(cache->rootdirname);
+	kfree(cache->tag);
+
+	_leave("");
+}
diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
new file mode 100644
index 0000000..b5ca30f
--- /dev/null
+++ b/fs/cachefiles/cf-interface.c
@@ -0,0 +1,1299 @@
+/* cf-interface.c: CacheFiles to FS-Cache interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/buffer_head.h>
+#include "internal.h"
+
+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+#define log2(n) ffz(~(n))
+
+/*****************************************************************************/
+/*
+ * look up the nominated node in this cache, creating it if necessary
+ */
+static struct fscache_object *cachefiles_lookup_object(
+	struct fscache_cache *_cache,
+	struct fscache_object *_parent,
+	struct fscache_cookie *cookie)
+{
+	struct cachefiles_object *parent, *object;
+	struct cachefiles_cache *cache;
+	struct cachefiles_xattr *auxdata;
+	unsigned keylen, auxlen;
+	void *buffer;
+	char *key;
+	int ret;
+
+	ASSERT(_parent);
+
+	cache = container_of(_cache, struct cachefiles_cache, cache);
+	parent = container_of(_parent, struct cachefiles_object, fscache);
+
+	_enter("{%s},%p,%p", cache->cache.identifier, parent, cookie);
+
+	/* create a new object record and a temporary leaf image */
+	object = kmem_cache_alloc(cachefiles_object_jar, SLAB_KERNEL);
+	if (!object)
+		goto nomem_object;
+
+	atomic_set(&object->usage, 1);
+	atomic_set(&object->fscache_usage, 1);
+
+	fscache_object_init(&object->fscache);
+	object->fscache.cookie = cookie;
+	object->fscache.cache = parent->fscache.cache;
+
+	object->type = cookie->def->type;
+
+	/* get hold of the raw key
+	 * - stick the length on the front and leave space on the back for the
+	 *   encoder
+	 */
+	buffer = kmalloc((2 + 512) + 3, GFP_KERNEL);
+	if (!buffer)
+		goto nomem_buffer;
+
+	keylen = cookie->def->get_key(cookie->netfs_data, buffer + 2, 512);
+	ASSERTCMP(keylen, <, 512);
+
+	*(uint16_t *)buffer = keylen;
+	((char *)buffer)[keylen + 2] = 0;
+	((char *)buffer)[keylen + 3] = 0;
+	((char *)buffer)[keylen + 4] = 0;
+
+	/* turn the raw key into something that can work with as a filename */
+	key = cachefiles_cook_key(buffer, keylen + 2, object->type);
+	if (!key)
+		goto nomem_key;
+
+	/* get hold of the auxiliary data and prepend the object type */
+	auxdata = buffer;
+	auxlen = 0;
+	if (cookie->def->get_aux) {
+		auxlen = cookie->def->get_aux(cookie->netfs_data,
+					      auxdata->data, 511);
+		ASSERTCMP(auxlen, <, 511);
+	}
+
+	auxdata->len = auxlen + 1;
+	auxdata->type = cookie->def->type;
+
+	/* look up the key, creating any missing bits */
+	ret = cachefiles_walk_to_object(parent, object, key, auxdata);
+	if (ret < 0)
+		goto lookup_failed;
+
+	kfree(buffer);
+	kfree(key);
+	_leave(" = %p", &object->fscache);
+	return &object->fscache;
+
+lookup_failed:
+	kmem_cache_free(cachefiles_object_jar, object);
+	kfree(buffer);
+	kfree(key);
+	_leave(" = %d", ret);
+	return ERR_PTR(ret);
+
+nomem_key:
+	kfree(buffer);
+nomem_buffer:
+	kmem_cache_free(cachefiles_object_jar, object);
+nomem_object:
+	_leave(" = -ENOMEM");
+	return ERR_PTR(-ENOMEM);
+
+}
+
+/*****************************************************************************/
+/*
+ * increment the usage count on an inode object (may fail if unmounting)
+ */
+static struct fscache_object *cachefiles_grab_object(struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+
+	_enter("%p", _object);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
+#ifdef CACHEFILES_DEBUG_SLAB
+	ASSERT((atomic_read(&object->fscache_usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+	atomic_inc(&object->fscache_usage);
+	return &object->fscache;
+
+}
+
+/*****************************************************************************/
+/*
+ * lock the semaphore on an object object
+ */
+static void cachefiles_lock_object(struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+
+	_enter("%p", _object);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
+#ifdef CACHEFILES_DEBUG_SLAB
+	ASSERT((atomic_read(&object->fscache_usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+	down_write(&object->sem);
+
+}
+
+/*****************************************************************************/
+/*
+ * unlock the semaphore on an object object
+ */
+static void cachefiles_unlock_object(struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+
+	_enter("%p", _object);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	up_write(&object->sem);
+
+}
+
+/*****************************************************************************/
+/*
+ * update the auxilliary data for an object object on disk
+ */
+static void cachefiles_update_object(struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+
+	_enter("%p", _object);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+
+	//cachefiles_tree_update_object(super, object);
+
+}
+
+/*****************************************************************************/
+/*
+ * dispose of a reference to an object object
+ */
+static void cachefiles_put_object(struct fscache_object *_object)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+
+	ASSERT(_object);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	_enter("%p{%d}", object, atomic_read(&object->usage));
+
+	ASSERT(object);
+
+	cache = container_of(object->fscache.cache,
+			     struct cachefiles_cache, cache);
+
+#ifdef CACHEFILES_DEBUG_SLAB
+	ASSERT((atomic_read(&object->fscache_usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+	if (!atomic_dec_and_test(&object->fscache_usage))
+		return;
+
+	_debug("- kill object %p", object);
+
+	/* delete retired objects */
+	if (test_bit(FSCACHE_OBJECT_RECYCLING, &object->fscache.flags) &&
+	    _object != cache->cache.fsdef
+	    ) {
+		_debug("- retire object %p", object);
+		cachefiles_delete_object(cache, object);
+	}
+
+	/* close the filesystem stuff attached to the object */
+	if (object->backer != object->dentry) {
+		dput(object->backer);
+		object->backer = NULL;
+	}
+
+	/* note that an object is now inactive */
+	write_lock(&cache->active_lock);
+	rb_erase(&object->active_node, &cache->active_nodes);
+	write_unlock(&cache->active_lock);
+
+	dput(object->dentry);
+	object->dentry = NULL;
+
+	/* then dispose of the object */
+	kmem_cache_free(cachefiles_object_jar, object);
+
+	_leave("");
+
+}
+
+/*****************************************************************************/
+/*
+ * sync a cache
+ */
+static void cachefiles_sync_cache(struct fscache_cache *_cache)
+{
+	struct cachefiles_cache *cache;
+	int ret;
+
+	_enter("%p", _cache);
+
+	cache = container_of(_cache, struct cachefiles_cache, cache);
+
+	/* make sure all pages pinned by operations on behalf of the netfs are
+	 * written to disc */
+	ret = fsync_super(cache->mnt->mnt_sb);
+	if (ret == -EIO)
+		cachefiles_io_error(cache,
+				    "Attempt to sync backing fs superblock"
+				    " returned error %d",
+				    ret);
+
+}
+
+/*****************************************************************************/
+/*
+ * set the data size on an object
+ */
+static int cachefiles_set_i_size(struct fscache_object *_object, loff_t i_size)
+{
+	struct cachefiles_object *object;
+	struct iattr newattrs;
+	int ret;
+
+	_enter("%p,%llu", _object, i_size);
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
+	if (i_size == object->i_size)
+		return 0;
+
+	if (!object->backer)
+		return -ENOBUFS;
+
+	ASSERT(S_ISREG(object->backer->d_inode->i_mode));
+
+	newattrs.ia_size = i_size;
+	newattrs.ia_valid = ATTR_SIZE;
+
+	mutex_lock(&object->backer->d_inode->i_mutex);
+	ret = notify_change(object->backer, &newattrs);
+	mutex_unlock(&object->backer->d_inode->i_mutex);
+
+	if (ret == -EIO) {
+		cachefiles_io_error_obj(object, "Size set failed");
+		ret = -ENOBUFS;
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+
+}
+
+/*****************************************************************************/
+/*
+ * see if we have space for a number of pages in the cache
+ */
+int cachefiles_has_space(struct cachefiles_cache *cache, unsigned nr)
+{
+	struct kstatfs stats;
+	int ret;
+
+	_enter("{%llu,%llu,%llu},%d",
+	       cache->brun, cache->bcull, cache->bstop,  nr);
+
+	/* find out how many pages of blockdev are available */
+	memset(&stats, 0, sizeof(stats));
+
+	ret = cache->mnt->mnt_sb->s_op->statfs(cache->mnt->mnt_root, &stats);
+	if (ret < 0) {
+		if (ret == -EIO)
+			cachefiles_io_error(cache, "statfs failed");
+		return ret;
+	}
+
+	stats.f_bavail >>= cache->bshift;
+
+	_debug("avail %llu", stats.f_bavail);
+
+	/* see if there is sufficient space */
+	stats.f_bavail -= nr;
+
+	ret = -ENOBUFS;
+	if (stats.f_bavail < cache->bstop)
+		goto begin_cull;
+
+	ret = 0;
+	if (stats.f_bavail < cache->bcull)
+		goto begin_cull;
+
+	if (test_bit(CACHEFILES_CULLING, &cache->flags) &&
+	    stats.f_bavail >= cache->brun
+	    ) {
+		if (test_and_clear_bit(CACHEFILES_CULLING, &cache->flags)) {
+			_debug("cease culling");
+			send_sigurg(&cache->cachefilesd->f_owner);
+		}
+	}
+
+	_leave(" = 0");
+	return 0;
+
+begin_cull:
+	if (!test_and_set_bit(CACHEFILES_CULLING, &cache->flags)) {
+		_debug("### CULL CACHE ###");
+		send_sigurg(&cache->cachefilesd->f_owner);
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+
+}
+
+/*****************************************************************************/
+/*
+ * waiting reading backing files
+ */
+static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode,
+				  int sync, void *_key)
+{
+	struct cachefiles_one_read *monitor =
+		container_of(wait, struct cachefiles_one_read, monitor);
+	struct wait_bit_key *key = _key;
+	struct page *page = wait->private;
+
+	ASSERT(key);
+
+	_enter("{%lu},%u,%d,{%p,%u}",
+	       monitor->netfs_page->index, mode, sync,
+	       key->flags, key->bit_nr);
+
+	if (key->flags != &page->flags ||
+	    key->bit_nr != PG_locked)
+		return 0;
+
+	_debug("--- monitor %p %lx ---", page, page->flags);
+
+	if (!PageUptodate(page) && !PageError(page))
+		dump_stack();
+
+	/* remove from the waitqueue */
+	list_del(&wait->task_list);
+
+	/* move onto the action list and queue for keventd */
+	ASSERT(monitor->object);
+
+	spin_lock(&monitor->object->work_lock);
+	list_move(&monitor->obj_link, &monitor->object->read_list);
+	spin_unlock(&monitor->object->work_lock);
+
+	schedule_work(&monitor->object->read_work);
+
+	return 0;
+
+}
+
+/*****************************************************************************/
+/*
+ * let keventd drive the copying of pages
+ */
+void cachefiles_read_copier_work(void *_object)
+{
+	struct cachefiles_one_read *monitor;
+	struct cachefiles_object *object = _object;
+	struct fscache_cookie *cookie = object->fscache.cookie;
+	struct pagevec pagevec;
+	int error, max;
+
+	_enter("{ino=%lu}", object->backer->d_inode->i_ino);
+
+	pagevec_init(&pagevec, 0);
+
+	max = 8;
+	spin_lock_irq(&object->work_lock);
+
+	while (!list_empty(&object->read_list)) {
+		monitor = list_entry(object->read_list.next,
+				     struct cachefiles_one_read, obj_link);
+		list_del(&monitor->obj_link);
+
+		spin_unlock_irq(&object->work_lock);
+
+		_debug("- copy {%lu}", monitor->back_page->index);
+
+		error = -EIO;
+		if (PageUptodate(monitor->back_page)) {
+			copy_highpage(monitor->netfs_page, monitor->back_page);
+
+			pagevec_add(&pagevec, monitor->netfs_page);
+			cookie->def->mark_pages_cached(
+				cookie->netfs_data,
+				monitor->netfs_page->mapping,
+				&pagevec);
+			pagevec_reinit(&pagevec);
+
+			error = 0;
+		}
+
+		if (error)
+			cachefiles_io_error_obj(
+				object,
+				"readpage failed on backing file %lx",
+				(unsigned long) monitor->back_page->flags);
+
+		page_cache_release(monitor->back_page);
+
+		monitor->end_io_func(monitor->netfs_page,
+				     monitor->context,
+				     error);
+
+		page_cache_release(monitor->netfs_page);
+		fscache_put_context(cookie, monitor->context);
+		kfree(monitor);
+
+		/* let keventd have some air occasionally */
+		max--;
+		if (max < 0 || need_resched()) {
+			if (!list_empty(&object->read_list))
+				schedule_work(&object->read_work);
+			_leave(" [maxed out]");
+			return;
+		}
+
+		spin_lock_irq(&object->work_lock);
+	}
+
+	spin_unlock_irq(&object->work_lock);
+
+	_leave("");
+
+}
+
+/*****************************************************************************/
+/*
+ * read the corresponding page to the given set from the backing file
+ * - an uncertain page is simply discarded, to be tried again another time
+ */
+static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
+					    fscache_rw_complete_t end_io_func,
+					    void *context,
+					    struct page *netpage,
+					    struct pagevec *lru_pvec)
+{
+	struct cachefiles_one_read *monitor;
+	struct address_space *bmapping;
+	struct page *newpage, *backpage;
+	int ret;
+
+	_enter("");
+
+	ASSERTCMP(pagevec_count(lru_pvec), ==, 0);
+	pagevec_reinit(lru_pvec);
+
+	_debug("read back %p{%lu,%d}",
+	       netpage, netpage->index, page_count(netpage));
+
+	monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
+	if (!monitor)
+		goto nomem;
+
+	monitor->netfs_page = netpage;
+	monitor->object = object;
+	monitor->end_io_func = end_io_func;
+	monitor->context = fscache_get_context(object->fscache.cookie,
+					       context);
+
+	init_waitqueue_func_entry(&monitor->monitor, cachefiles_read_waiter);
+
+	/* attempt to get hold of the backing page */
+	bmapping = object->backer->d_inode->i_mapping;
+	newpage = NULL;
+
+	for (;;) {
+		backpage = find_get_page(bmapping, netpage->index);
+		if (backpage)
+			goto backing_page_already_present;
+
+		if (!newpage) {
+			newpage = page_cache_alloc_cold(bmapping);
+			if (!newpage)
+				goto nomem_monitor;
+		}
+
+		ret = add_to_page_cache(newpage, bmapping,
+					netpage->index, GFP_KERNEL);
+		if (ret == 0)
+			goto installed_new_backing_page;
+		if (ret != -EEXIST)
+			goto nomem_page;
+	}
+
+	/* we've installed a new backing page, so now we need to add it
+	 * to the LRU list and start it reading */
+installed_new_backing_page:
+	_debug("- new %p", newpage);
+
+	backpage = newpage;
+	newpage = NULL;
+
+	page_cache_get(backpage);
+	pagevec_add(lru_pvec, backpage);
+	__pagevec_lru_add(lru_pvec);
+
+	ret = bmapping->a_ops->readpage(NULL, backpage);
+	if (ret < 0)
+		goto read_error;
+
+	/* set the monitor to transfer the data across */
+monitor_backing_page:
+	_debug("- monitor add");
+
+	/* install the monitor */
+	page_cache_get(monitor->netfs_page);
+	page_cache_get(backpage);
+	monitor->back_page = backpage;
+
+	spin_lock_irq(&object->work_lock);
+	list_add_tail(&monitor->obj_link, &object->read_pend_list);
+	spin_unlock_irq(&object->work_lock);
+
+	monitor->monitor.private = backpage;
+	install_page_waitqueue_monitor(backpage, &monitor->monitor);
+	monitor = NULL;
+
+	/* but the page may have been read before the monitor was
+	 * installed, so the monitor may miss the event - so we have to
+	 * ensure that we do get one in such a case */
+	if (!TestSetPageLocked(backpage))
+		unlock_page(backpage);
+	goto success;
+
+	/* if the backing page is already present, it can be in one of
+	 * three states: read in progress, read failed or read okay */
+backing_page_already_present:
+	_debug("- present");
+
+	if (newpage) {
+		page_cache_release(newpage);
+		newpage = NULL;
+	}
+
+	if (PageError(backpage))
+		goto io_error;
+
+	if (PageUptodate(backpage))
+		goto backing_page_already_uptodate;
+
+	goto monitor_backing_page;
+
+	/* the backing page is already up to date, attach the netfs
+	 * page to the pagecache and LRU and copy the data across */
+backing_page_already_uptodate:
+	_debug("- uptodate");
+
+	copy_highpage(netpage, backpage);
+	end_io_func(netpage, context, 0);
+
+success:
+	_debug("success");
+	ret = 0;
+
+out:
+	if (backpage)
+		page_cache_release(backpage);
+	if (monitor) {
+		fscache_put_context(object->fscache.cookie, monitor->context);
+		kfree(monitor);
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+
+read_error:
+	_debug("read error %d", ret);
+	if (ret == -ENOMEM)
+		goto out;
+io_error:
+	cachefiles_io_error_obj(object, "page read error on backing file");
+	ret = -EIO;
+	goto out;
+
+nomem_page:
+	page_cache_release(newpage);
+nomem_monitor:
+	fscache_put_context(object->fscache.cookie, monitor->context);
+	kfree(monitor);
+nomem:
+	_leave(" = -ENOMEM");
+	return -ENOMEM;
+
+}
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - cache withdrawal is prevented by the caller
+ * - returns -EINTR if interrupted
+ * - returns -ENOMEM if ran out of memory
+ * - returns -ENOBUFS if no buffers can be made available
+ * - returns -ENOBUFS if page is beyond EOF
+ * - if the page is backed by a block in the cache:
+ *   - a read will be started which will call the callback on completion
+ *   - 0 will be returned
+ * - else if the page is unbacked:
+ *   - the metadata will be retained
+ *   - -ENODATA will be returned
+ */
+static int cachefiles_read_or_alloc_page(struct fscache_object *_object,
+					 struct page *page,
+					 fscache_rw_complete_t end_io_func,
+					 void *context,
+					 unsigned long gfp)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+	struct fscache_cookie *cookie;
+	struct pagevec pagevec;
+	struct inode *inode;
+	sector_t block0, block;
+	unsigned shift;
+	int ret;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+
+	_enter("{%p},{%lx},,,", object, page->index);
+
+	if (!object->backer)
+		return -ENOBUFS;
+
+	inode = object->backer->d_inode;
+	ASSERT(S_ISREG(inode->i_mode));
+	ASSERT(inode->i_mapping->a_ops->bmap);
+	ASSERT(inode->i_mapping->a_ops->readpages);
+
+	/* calculate the shift required to use bmap */
+	if (inode->i_sb->s_blocksize > PAGE_SIZE)
+		return -ENOBUFS;
+
+	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+	cookie = object->fscache.cookie;
+
+	pagevec_init(&pagevec, 0);
+
+	/* we assume the absence or presence of the first block is a good
+	 * enough indication for the page as a whole
+	 * - TODO: don't use bmap() for this as it is _not_ actually good
+	 *   enough for this as it doesn't indicate errors, but it's all we've
+	 *   got for the moment
+	 */
+	block0 = page->index;
+	block0 <<= shift;
+
+	block = inode->i_mapping->a_ops->bmap(inode->i_mapping, block0);
+	_debug("%llx -> %llx", block0, block);
+
+	if (block) {
+		/* submit the apparently valid page to the backing fs to be
+		 * read from disk */
+		ret = cachefiles_read_backing_file_one(object,
+						       end_io_func,
+						       context,
+						       page,
+						       &pagevec);
+		ret = 0;
+	} else if (cachefiles_has_space(cache, 1) == 0) {
+		/* there's space in the cache we can use */
+		pagevec_add(&pagevec, page);
+		cookie->def->mark_pages_cached(cookie->netfs_data,
+					       page->mapping, &pagevec);
+		ret = -ENODATA;
+	} else {
+		ret = -ENOBUFS;
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+
+}
+
+/*****************************************************************************/
+/*
+ * read the corresponding pages to the given set from the backing file
+ * - any uncertain pages are simply discarded, to be tried again another time
+ */
+static int cachefiles_read_backing_file(struct cachefiles_object *object,
+					fscache_rw_complete_t end_io_func,
+					void *context,
+					struct address_space *mapping,
+					struct list_head *list,
+					struct pagevec *lru_pvec)
+{
+	struct cachefiles_one_read *monitor = NULL;
+	struct address_space *bmapping = object->backer->d_inode->i_mapping;
+	struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
+	int ret = 0;
+
+	_enter("");
+
+	ASSERTCMP(pagevec_count(lru_pvec), ==, 0);
+	pagevec_reinit(lru_pvec);
+
+	list_for_each_entry_safe(netpage, _n, list, lru) {
+		list_del(&netpage->lru);
+
+		_debug("read back %p{%lu,%d}",
+		       netpage, netpage->index, page_count(netpage));
+
+		if (!monitor) {
+			monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
+			if (!monitor)
+				goto nomem;
+
+			monitor->object = object;
+			monitor->end_io_func = end_io_func;
+			monitor->context = fscache_get_context(
+				object->fscache.cookie, context);
+
+			init_waitqueue_func_entry(&monitor->monitor,
+						  cachefiles_read_waiter);
+		}
+
+		for (;;) {
+			backpage = find_get_page(bmapping, netpage->index);
+			if (backpage)
+				goto backing_page_already_present;
+
+			if (!newpage) {
+				newpage = page_cache_alloc_cold(bmapping);
+				if (!newpage)
+					goto nomem;
+			}
+
+			ret = add_to_page_cache(newpage, bmapping,
+						netpage->index, GFP_KERNEL);
+			if (ret == 0)
+				goto installed_new_backing_page;
+			if (ret != -EEXIST)
+				goto nomem;
+		}
+
+		/* we've installed a new backing page, so now we need to add it
+		 * to the LRU list and start it reading */
+	installed_new_backing_page:
+		_debug("- new %p", newpage);
+
+		backpage = newpage;
+		newpage = NULL;
+
+		page_cache_get(backpage);
+		if (!pagevec_add(lru_pvec, backpage))
+			__pagevec_lru_add(lru_pvec);
+
+	reread_backing_page:
+		ret = bmapping->a_ops->readpage(NULL, backpage);
+		if (ret < 0)
+			goto read_error;
+
+		/* add the netfs page to the pagecache and LRU, and set the
+		 * monitor to transfer the data across */
+	monitor_backing_page:
+		_debug("- monitor add");
+
+		ret = add_to_page_cache(netpage, mapping, netpage->index,
+					GFP_KERNEL);
+		if (ret < 0) {
+			if (ret == -EEXIST) {
+				page_cache_release(netpage);
+				continue;
+			}
+			goto nomem;
+		}
+
+		page_cache_get(netpage);
+		if (!pagevec_add(lru_pvec, netpage))
+			__pagevec_lru_add(lru_pvec);
+
+		/* install a monitor */
+		page_cache_get(netpage);
+		monitor->netfs_page = netpage;
+
+		page_cache_get(backpage);
+		monitor->back_page = backpage;
+
+		spin_lock_irq(&object->work_lock);
+		list_add_tail(&monitor->obj_link, &object->read_pend_list);
+		spin_unlock_irq(&object->work_lock);
+
+		monitor->monitor.private = backpage;
+		install_page_waitqueue_monitor(backpage, &monitor->monitor);
+		monitor = NULL;
+
+		/* but the page may have been read before the monitor was
+		 * installed, so the monitor may miss the event - so we have to
+		 * ensure that we do get one in such a case */
+		if (!TestSetPageLocked(backpage)) {
+			_debug("2unlock %p", backpage);
+			unlock_page(backpage);
+		}
+
+		page_cache_release(backpage);
+		backpage = NULL;
+
+		page_cache_release(netpage);
+		netpage = NULL;
+		continue;
+
+		/* if the backing page is already present, it can be in one of
+		 * three states: read in progress, read failed or read okay */
+	backing_page_already_present:
+		_debug("- present %p", backpage);
+
+		if (PageError(backpage))
+			goto io_error;
+
+		if (PageUptodate(backpage))
+			goto backing_page_already_uptodate;
+
+		_debug("- not ready %p{%lx}", backpage, backpage->flags);
+
+		if (TestSetPageLocked(backpage))
+			goto monitor_backing_page;
+
+		if (PageError(backpage)) {
+			unlock_page(backpage);
+			goto io_error;
+		}
+
+		if (PageUptodate(backpage))
+			goto backing_page_already_uptodate_unlock;
+
+		/* we've locked a page that's neither up to date nor erroneous,
+		 * so we need to attempt to read it again */
+		goto reread_backing_page;
+
+		/* the backing page is already up to date, attach the netfs
+		 * page to the pagecache and LRU and copy the data across */
+	backing_page_already_uptodate_unlock:
+		unlock_page(backpage);
+	backing_page_already_uptodate:
+		_debug("- uptodate");
+
+		ret = add_to_page_cache(netpage, mapping, netpage->index,
+					GFP_KERNEL);
+		if (ret < 0) {
+			if (ret == -EEXIST) {
+				page_cache_release(netpage);
+				continue;
+			}
+			goto nomem;
+		}
+
+		copy_highpage(netpage, backpage);
+
+		page_cache_release(backpage);
+		backpage = NULL;
+
+		page_cache_get(netpage);
+		if (!pagevec_add(lru_pvec, netpage))
+			__pagevec_lru_add(lru_pvec);
+
+		end_io_func(netpage, context, 0);
+
+		page_cache_release(netpage);
+		netpage = NULL;
+		continue;
+	}
+
+	netpage = NULL;
+
+	_debug("out");
+
+out:
+	/* tidy up */
+	pagevec_lru_add(lru_pvec);
+
+	if (newpage)
+		page_cache_release(newpage);
+	if (netpage)
+		page_cache_release(netpage);
+	if (backpage)
+		page_cache_release(backpage);
+	if (monitor) {
+		fscache_put_context(object->fscache.cookie, monitor->context);
+		kfree(monitor);
+	}
+
+	list_for_each_entry_safe(netpage, _n, list, lru) {
+		list_del(&netpage->lru);
+		page_cache_release(netpage);
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+
+nomem:
+	_debug("nomem");
+	ret = -ENOMEM;
+	goto out;
+
+read_error:
+	_debug("read error %d", ret);
+	if (ret == -ENOMEM)
+		goto out;
+io_error:
+	cachefiles_io_error_obj(object, "page read error on backing file");
+	ret = -EIO;
+	goto out;
+
+}
+
+/*****************************************************************************/
+/*
+ * read a list of pages from the cache or allocate blocks in which to store
+ * them
+ */
+static int cachefiles_read_or_alloc_pages(struct fscache_object *_object,
+					  struct address_space *mapping,
+					  struct list_head *pages,
+					  unsigned *nr_pages,
+					  fscache_rw_complete_t end_io_func,
+					  void *context,
+					  unsigned long gfp)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+	struct fscache_cookie *cookie;
+	struct list_head backpages;
+	struct pagevec pagevec;
+	struct inode *inode;
+	struct page *page, *_n;
+	unsigned shift, nrbackpages;
+	int ret, ret2, space;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+
+	_enter("{%p},,%d,,", object, *nr_pages);
+
+	if (!object->backer)
+		return -ENOBUFS;
+
+	space = 1;
+	if (cachefiles_has_space(cache, *nr_pages) < 0)
+		space = 0;
+
+	inode = object->backer->d_inode;
+	ASSERT(S_ISREG(inode->i_mode));
+	ASSERT(inode->i_mapping->a_ops->bmap);
+	ASSERT(inode->i_mapping->a_ops->readpages);
+
+	/* calculate the shift required to use bmap */
+	if (inode->i_sb->s_blocksize > PAGE_SIZE)
+		return -ENOBUFS;
+
+	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+	pagevec_init(&pagevec, 0);
+
+	cookie = object->fscache.cookie;
+
+	INIT_LIST_HEAD(&backpages);
+	nrbackpages = 0;
+
+	ret = space ? -ENODATA : -ENOBUFS;
+	list_for_each_entry_safe(page, _n, pages, lru) {
+		sector_t block0, block;
+
+		/* we assume the absence or presence of the first block is a
+		 * good enough indication for the page as a whole
+		 * - TODO: don't use bmap() for this as it is _not_ actually
+		 *   good enough for this as it doesn't indicate errors, but
+		 *   it's all we've got for the moment
+		 */
+		block0 = page->index;
+		block0 <<= shift;
+
+		block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
+						      block0);
+		_debug("%llx -> %llx", block0, block);
+
+		if (block) {
+			/* we have data - add it to the list to give to the
+			 * backing fs */
+			list_move(&page->lru, &backpages);
+			(*nr_pages)--;
+			nrbackpages++;
+		} else if (space && pagevec_add(&pagevec, page) == 0) {
+			cookie->def->mark_pages_cached(cookie->netfs_data,
+						       mapping, &pagevec);
+			pagevec_reinit(&pagevec);
+			ret = -ENODATA;
+		}
+	}
+
+	if (pagevec_count(&pagevec) > 0) {
+		cookie->def->mark_pages_cached(cookie->netfs_data,
+					       mapping, &pagevec);
+		pagevec_reinit(&pagevec);
+	}
+
+	if (list_empty(pages))
+		ret = 0;
+
+	/* submit the apparently valid pages to the backing fs to be read from disk */
+	if (nrbackpages > 0) {
+		ret2 = cachefiles_read_backing_file(object,
+						    end_io_func,
+						    context,
+						    mapping,
+						    &backpages,
+						    &pagevec);
+
+		ASSERTCMP(pagevec_count(&pagevec), ==, 0);
+
+		if (ret2 == -ENOMEM || ret2 == -EINTR)
+			ret = ret2;
+	}
+
+	_leave(" = %d [nr=%u%s]",
+	       ret, *nr_pages, list_empty(pages) ? " empty" : "");
+	return ret;
+
+}
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - cache withdrawal is prevented by the caller
+ * - returns -EINTR if interrupted
+ * - returns -ENOMEM if ran out of memory
+ * - returns -ENOBUFS if no buffers can be made available
+ * - returns -ENOBUFS if page is beyond EOF
+ * - otherwise:
+ *   - the metadata will be retained
+ *   - 0 will be returned
+ */
+static int cachefiles_allocate_page(struct fscache_object *_object,
+				    struct page *page,
+				    unsigned long gfp)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	cache = container_of(object->fscache.cache,
+			     struct cachefiles_cache, cache);
+
+	_enter("%p,{%lx},,,", object, page->index);
+
+	return cachefiles_has_space(cache, 1);
+
+}
+
+/*****************************************************************************/
+/*
+ * page storer
+ */
+void cachefiles_write_work(void *_object)
+{
+	struct cachefiles_one_write *writer;
+	struct cachefiles_object *object = _object;
+	int ret, max;
+
+	_enter("%p", object);
+
+	ASSERT(!irqs_disabled());
+
+	spin_lock_irq(&object->work_lock);
+	max = 8;
+
+	while (!list_empty(&object->write_list)) {
+		writer = list_entry(object->write_list.next,
+				    struct cachefiles_one_write, obj_link);
+		list_del(&writer->obj_link);
+
+		spin_unlock_irq(&object->work_lock);
+
+		_debug("- store {%lu}", writer->netfs_page->index);
+
+		ret = generic_file_buffered_write_one_kernel_page(
+			object->backer->d_inode->i_mapping,
+			writer->netfs_page->index,
+			writer->netfs_page);
+
+		if (ret == -ENOSPC) {
+			ret = -ENOBUFS;
+		} else if (ret == -EIO) {
+			cachefiles_io_error_obj(object,
+						"write page to backing file"
+						" failed");
+			ret = -ENOBUFS;
+		}
+
+		_debug("- callback");
+		writer->end_io_func(writer->netfs_page,
+				    writer->context,
+				    ret);
+
+		_debug("- put net");
+		page_cache_release(writer->netfs_page);
+		fscache_put_context(object->fscache.cookie, writer->context);
+		kfree(writer);
+
+		/* let keventd have some air occasionally */
+		max--;
+		if (max < 0 || need_resched()) {
+			if (!list_empty(&object->write_list))
+				schedule_work(&object->write_work);
+			_leave(" [maxed out]");
+			return;
+		}
+
+		_debug("- next");
+		spin_lock_irq(&object->work_lock);
+	}
+
+	spin_unlock_irq(&object->work_lock);
+	_leave("");
+
+}
+
+/*****************************************************************************/
+/*
+ * request a page be stored in the cache
+ * - cache withdrawal is prevented by the caller
+ * - this request may be ignored if there's no cache block available, in which
+ *   case -ENOBUFS will be returned
+ * - if the op is in progress, 0 will be returned
+ */
+static int cachefiles_write_page(struct fscache_object *_object,
+				 struct page *page,
+				 fscache_rw_complete_t end_io_func,
+				 void *context,
+				 unsigned long gfp)
+{
+//	struct cachefiles_one_write *writer;
+	struct cachefiles_object *object;
+	int ret;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+
+	_enter("%p,%p{%lx},,,", object, page, page->index);
+
+	if (!object->backer)
+		return -ENOBUFS;
+
+	ASSERT(S_ISREG(object->backer->d_inode->i_mode));
+
+#if 0 // set to 1 for deferred writing
+	/* queue the operation for deferred processing by keventd */
+	writer = kzalloc(sizeof(*writer), GFP_KERNEL);
+	if (!writer)
+		return -ENOMEM;
+
+	page_cache_get(page);
+	writer->netfs_page = page;
+	writer->object = object;
+	writer->end_io_func = end_io_func;
+	writer->context = facache_get_context(object->fscache.cookie, context);
+
+	spin_lock_irq(&object->work_lock);
+	list_add_tail(&writer->obj_link, &object->write_list);
+	spin_unlock_irq(&object->work_lock);
+
+	schedule_work(&object->write_work);
+	ret = 0;
+
+#else
+	/* copy the page to ext3 and let it store it in its own time */
+	ret = generic_file_buffered_write_one_kernel_page(
+		object->backer->d_inode->i_mapping, page->index, page);
+
+	if (ret != 0) {
+		if (ret == -EIO)
+			cachefiles_io_error_obj(object,
+						"write page to backing file"
+						" failed");
+		ret = -ENOBUFS;
+	}
+
+	end_io_func(page, context, ret);
+#endif
+
+	_leave(" = %d", ret);
+	return ret;
+
+}
+
+/*****************************************************************************/
+/*
+ * detach a backing block from a page
+ * - cache withdrawal is prevented by the caller
+ */
+static void cachefiles_uncache_pages(struct fscache_object *_object,
+				     struct pagevec *pagevec)
+{
+	struct cachefiles_object *object;
+	struct cachefiles_cache *cache;
+
+	object = container_of(_object, struct cachefiles_object, fscache);
+	cache = container_of(object->fscache.cache,
+			     struct cachefiles_cache, cache);
+
+	_enter("%p,{%lu,%lx},,,",
+	       object, pagevec->nr, pagevec->pages[0]->index);
+
+}
+
+/*****************************************************************************/
+/*
+ * dissociate a cache from all the pages it was backing
+ */
+static void cachefiles_dissociate_pages(struct fscache_cache *cache)
+{
+	_enter("");
+
+}
+
+struct fscache_cache_ops cachefiles_cache_ops = {
+	.name			= "cachefiles",
+	.lookup_object		= cachefiles_lookup_object,
+	.grab_object		= cachefiles_grab_object,
+	.lock_object		= cachefiles_lock_object,
+	.unlock_object		= cachefiles_unlock_object,
+	.update_object		= cachefiles_update_object,
+	.put_object		= cachefiles_put_object,
+	.sync_cache		= cachefiles_sync_cache,
+	.set_i_size		= cachefiles_set_i_size,
+	.read_or_alloc_page	= cachefiles_read_or_alloc_page,
+	.read_or_alloc_pages	= cachefiles_read_or_alloc_pages,
+	.allocate_page		= cachefiles_allocate_page,
+	.write_page		= cachefiles_write_page,
+	.uncache_pages		= cachefiles_uncache_pages,
+	.dissociate_pages	= cachefiles_dissociate_pages,
+};
diff --git a/fs/cachefiles/cf-key.c b/fs/cachefiles/cf-key.c
new file mode 100644
index 0000000..cfae5a7
--- /dev/null
+++ b/fs/cachefiles/cf-key.c
@@ -0,0 +1,157 @@
+/* cf-key.c: Key to pathname encoder
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/slab.h>
+#include "internal.h"
+
+static const char cachefiles_charmap[64] =
+	"0123456789"			/* 0 - 9 */
+	"abcdefghijklmnopqrstuvwxyz"	/* 10 - 35 */
+	"ABCDEFGHIJKLMNOPQRSTUVWXYZ"	/* 36 - 61 */
+	"_-"				/* 62 - 63 */
+	;
+
+static const char cachefiles_filecharmap[256] = {
+	/* we skip space and tab and control chars */
+	[ 33 ... 46 ] = 1,		/* '!' -> '.' */
+	/* we skip '/' as it's significant to pathwalk */
+	[ 48 ... 127 ] = 1,		/* '0' -> '~' */
+};
+
+/*****************************************************************************/
+/*
+ * turn the raw key into something cooked
+ * - the raw key should include the length in the two bytes at the front
+ * - the key may be up to 514 bytes in length (including the length word)
+ *   - "base64" encode the strange keys, mapping 3 bytes of raw to four of
+ *     cooked
+ *   - need to cut the cooked key into 252 char lengths (189 raw bytes)
+ */
+char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type)
+{
+	unsigned char csum, ch;
+	unsigned int acc;
+	char *key;
+	int loop, len, max, seg, mark, print;
+
+	_enter(",%d", keylen);
+
+	BUG_ON(keylen < 2 || keylen > 514);
+
+	csum = raw[0] + raw[1];
+	print = 1;
+	for (loop = 2; loop < keylen; loop++) {
+		ch = raw[loop];
+		csum += ch;
+		print &= cachefiles_filecharmap[ch];
+	}
+
+	if (print) {
+		/* if the path is usable ASCII, then we render it directly */
+		max = keylen - 2;
+		max += 2;	/* two base64'd length chars on the front */
+		max += 5;	/* @checksum/M */
+		max += 3 * 2;	/* maximum number of segment dividers (".../M")
+				 * is ((514 + 251) / 252) = 3
+				 */
+		max += 1;	/* NUL on end */
+	} else {
+		/* calculate the maximum length of the cooked key */
+		keylen = (keylen + 2) / 3;
+
+		max = keylen * 4;
+		max += 5;	/* @checksum/M */
+		max += 3 * 2;	/* maximum number of segment dividers (".../M")
+				 * is ((514 + 188) / 189) = 3
+				 */
+		max += 1;	/* NUL on end */
+	}
+
+	_debug("max: %d", max);
+
+	key = kmalloc(max, GFP_KERNEL);
+	if (!key)
+		return NULL;
+
+	len = 0;
+
+	/* build the cooked key */
+	sprintf(key, "@%02x/+", (unsigned) csum);
+	len = 5;
+	mark = len - 1;
+
+	if (print) {
+		acc = *(uint16_t *) raw;
+		raw += 2;
+
+		key[len + 1] = cachefiles_charmap[acc & 63];
+		acc >>= 6;
+		key[len] = cachefiles_charmap[acc & 63];
+		len += 2;
+
+		seg = 250;
+		for (loop = keylen; loop > 0; loop--) {
+			if (seg <= 0) {
+				key[len++] = '/';
+				mark = len;
+				key[len++] = '+';
+				seg = 252;
+			}
+
+			key[len++] = *raw++;
+			ASSERT(len < max);
+		}
+
+		switch (type) {
+		case FSCACHE_COOKIE_TYPE_INDEX:		type = 'I';	break;
+		case FSCACHE_COOKIE_TYPE_DATAFILE:	type = 'D';	break;
+		default:				type = 'S';	break;
+		}
+	} else {
+		seg = 252;
+		for (loop = keylen; loop > 0; loop--) {
+			if (seg <= 0) {
+				key[len++] = '/';
+				mark = len;
+				key[len++] = '+';
+				seg = 252;
+			}
+
+			acc = *raw++;
+			acc |= *raw++ << 8;
+			acc |= *raw++ << 16;
+
+			_debug("acc: %06x", acc);
+
+			key[len++] = cachefiles_charmap[acc & 63];
+			acc >>= 6;
+			key[len++] = cachefiles_charmap[acc & 63];
+			acc >>= 6;
+			key[len++] = cachefiles_charmap[acc & 63];
+			acc >>= 6;
+			key[len++] = cachefiles_charmap[acc & 63];
+
+			ASSERT(len < max);
+		}
+
+		switch (type) {
+		case FSCACHE_COOKIE_TYPE_INDEX:		type = 'J';	break;
+		case FSCACHE_COOKIE_TYPE_DATAFILE:	type = 'E';	break;
+		default:				type = 'T';	break;
+		}
+	}
+
+	key[mark] = type;
+	key[len] = 0;
+
+	_leave(" = %p %d:[%s]", key, len, key);
+	return key;
+}
diff --git a/fs/cachefiles/cf-main.c b/fs/cachefiles/cf-main.c
new file mode 100644
index 0000000..da8d020
--- /dev/null
+++ b/fs/cachefiles/cf-main.c
@@ -0,0 +1,129 @@
+/* cf-main.c: network filesystem caching backend to use cache files on a
+ *            premounted filesystem
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/proc_fs.h>
+#include <linux/sysctl.h>
+#include "internal.h"
+
+unsigned long cachefiles_debug;
+
+static int cachefiles_init(void);
+static void cachefiles_exit(void);
+
+fs_initcall(cachefiles_init);
+module_exit(cachefiles_exit);
+
+MODULE_DESCRIPTION("Mounted-filesystem based cache");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+kmem_cache_t *cachefiles_object_jar;
+
+static void cachefiles_object_init_once(void *_object, kmem_cache_t *cachep,
+					unsigned long flags)
+{
+	struct cachefiles_object *object = _object;
+
+	switch (flags & (SLAB_CTOR_VERIFY | SLAB_CTOR_CONSTRUCTOR)) {
+	case SLAB_CTOR_CONSTRUCTOR:
+		memset(object, 0, sizeof(*object));
+		fscache_object_init(&object->fscache);
+		init_rwsem(&object->sem);
+		spin_lock_init(&object->work_lock);
+		INIT_LIST_HEAD(&object->read_list);
+		INIT_LIST_HEAD(&object->read_pend_list);
+		INIT_WORK(&object->read_work, &cachefiles_read_copier_work,
+			  object);
+		INIT_LIST_HEAD(&object->write_list);
+		INIT_WORK(&object->write_work, &cachefiles_write_work, object);
+		break;
+
+	default:
+		break;
+	}
+}
+
+/*****************************************************************************/
+/*
+ * initialise the fs caching module
+ */
+static int __init cachefiles_init(void)
+{
+	struct proc_dir_entry *pde;
+	int ret;
+
+	/* create a proc entry to use as a handle for the userspace daemon */
+	ret = -ENOMEM;
+
+	pde = create_proc_entry("cachefiles", 0600, proc_root_fs);
+	if (!pde) {
+		kerror("Unable to create /proc/fs/cachefiles");
+		goto error_proc;
+	}
+
+	if (cachefiles_sysctl_init() < 0) {
+		kerror("Unable to create sysctl parameters");
+		goto error_object_jar;
+	}
+
+	pde->owner = THIS_MODULE;
+	pde->proc_fops = &cachefiles_proc_fops;
+	cachefiles_proc = pde;
+
+	/* create an object jar */
+	cachefiles_object_jar =
+		kmem_cache_create("cachefiles_object_jar",
+				  sizeof(struct cachefiles_object),
+				  0,
+				  SLAB_HWCACHE_ALIGN,
+				  cachefiles_object_init_once,
+				  NULL);
+	if (!cachefiles_object_jar) {
+		printk(KERN_NOTICE
+		       "CacheFiles: Failed to allocate an object jar\n");
+		goto error_sysctl;
+	}
+
+	printk(KERN_INFO "CacheFiles: Loaded\n");
+	return 0;
+
+error_sysctl:
+	cachefiles_sysctl_cleanup();
+error_object_jar:
+	remove_proc_entry("cachefiles", proc_root_fs);
+error_proc:
+	kerror("failed to register: %d", ret);
+	return ret;
+}
+
+/*****************************************************************************/
+/*
+ * clean up on module removal
+ */
+static void __exit cachefiles_exit(void)
+{
+	printk(KERN_INFO "CacheFiles: Unloading\n");
+
+	kmem_cache_destroy(cachefiles_object_jar);
+	remove_proc_entry("cachefiles", proc_root_fs);
+	cachefiles_sysctl_cleanup();
+}
diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
new file mode 100644
index 0000000..20db88a
--- /dev/null
+++ b/fs/cachefiles/cf-namei.c
@@ -0,0 +1,825 @@
+/* cf-namei.c: CacheFiles path walking and related routines
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/quotaops.h>
+#include <linux/xattr.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include "internal.h"
+
+/*****************************************************************************/
+/*
+ * record the fact that an object is now active
+ */
+static void cachefiles_mark_object_active(struct cachefiles_cache *cache,
+					  struct cachefiles_object *object)
+{
+	struct cachefiles_object *xobject;
+	struct rb_node **_p, *_parent = NULL;
+	struct dentry *dentry;
+
+	write_lock(&cache->active_lock);
+
+	dentry = object->dentry;
+	_p = &cache->active_nodes.rb_node;
+	while (*_p) {
+		_parent = *_p;
+		xobject = rb_entry(_parent,
+				   struct cachefiles_object, active_node);
+
+		if (xobject->dentry > dentry)
+			_p = &(*_p)->rb_left;
+		else if (xobject->dentry < dentry)
+			_p = &(*_p)->rb_right;
+		else
+			BUG(); /* uh oh... this dentry shouldn't be here */
+	}
+
+	rb_link_node(&object->active_node, _parent, _p);
+	rb_insert_color(&object->active_node, &cache->active_nodes);
+
+	write_unlock(&cache->active_lock);
+}
+
+/*****************************************************************************/
+/*
+ * delete an object representation from the cache
+ * - file backed objects are unlinked
+ * - directory backed objects are stuffed into the graveyard for userspace to
+ *   delete
+ * - unlocks the directory mutex
+ */
+static int cachefiles_bury_object(struct cachefiles_cache *cache,
+				  struct dentry *dir,
+				  struct dentry *rep)
+{
+	struct dentry *grave, *alt, *trap;
+	struct qstr name;
+	const char *old_name;
+	char nbuffer[8 + 8 + 1];
+	int ret;
+
+	_enter(",'%*.*s','%*.*s'",
+	       dir->d_name.len, dir->d_name.len, dir->d_name.name,
+	       rep->d_name.len, rep->d_name.len, rep->d_name.name);
+
+	/* non-directories can just be unlinked */
+	if (!S_ISDIR(rep->d_inode->i_mode)) {
+		_debug("unlink stale object");
+		ret = dir->d_inode->i_op->unlink(dir->d_inode, rep);
+
+		mutex_unlock(&dir->d_inode->i_mutex);
+
+		if (ret == 0) {
+			_debug("d_delete");
+			d_delete(rep);
+		} else if (ret == -EIO) {
+			cachefiles_io_error(cache, "Unlink failed");
+		}
+
+		_leave(" = %d", ret);
+		return ret;
+	}
+
+	/* directories have to be moved to the graveyard */
+	_debug("move stale object to graveyard");
+	mutex_unlock(&dir->d_inode->i_mutex);
+
+try_again:
+	/* first step is to make up a grave dentry in the graveyard */
+	sprintf(nbuffer, "%08x%08x",
+		(uint32_t) xtime.tv_sec,
+		(uint32_t) atomic_inc_return(&cache->gravecounter));
+
+	name.name = nbuffer;
+	name.len = strlen(name.name);
+
+	/* hash the name */
+	name.hash = full_name_hash(name.name, name.len);
+
+	if (dir->d_op && dir->d_op->d_hash) {
+		ret = dir->d_op->d_hash(dir, &name);
+		if (ret < 0) {
+			if (ret == -EIO)
+				cachefiles_io_error(cache, "Hash failed");
+
+			_leave(" = %d", ret);
+			return ret;
+		}
+	}
+
+	/* do the multiway lock magic */
+	trap = lock_rename(cache->graveyard, dir);
+
+	/* do some checks before getting the grave dentry */
+	if (rep->d_parent != dir) {
+		/* the entry was probably culled when we dropped the parent dir
+		 * lock */
+		unlock_rename(cache->graveyard, dir);
+		_leave(" = 0 [culled?]");
+		return 0;
+	}
+
+	if (!S_ISDIR(cache->graveyard->d_inode->i_mode)) {
+		unlock_rename(cache->graveyard, dir);
+		cachefiles_io_error(cache, "Graveyard no longer a directory");
+		return -EIO;
+	}
+
+	if (trap == rep) {
+		unlock_rename(cache->graveyard, dir);
+		cachefiles_io_error(cache, "May not make directory loop");
+		return -EIO;
+	}
+
+	if (d_mountpoint(rep)) {
+		unlock_rename(cache->graveyard, dir);
+		cachefiles_io_error(cache, "Mountpoint in cache");
+		return -EIO;
+	}
+
+	/* see if there's a dentry already there for this name */
+	grave = d_lookup(cache->graveyard, &name);
+	if (!grave) {
+		_debug("not found");
+
+		grave = d_alloc(cache->graveyard, &name);
+		if (!grave) {
+			unlock_rename(cache->graveyard, dir);
+			_leave(" = -ENOMEM");
+			return -ENOMEM;
+		}
+
+		alt = cache->graveyard->d_inode->i_op->lookup(
+			cache->graveyard->d_inode, grave, NULL);
+		if (IS_ERR(alt)) {
+			unlock_rename(cache->graveyard, dir);
+			dput(grave);
+
+			if (PTR_ERR(alt) == -ENOMEM) {
+				_leave(" = -ENOMEM");
+				return -ENOMEM;
+			}
+
+			cachefiles_io_error(cache, "Lookup error %ld",
+					    PTR_ERR(alt));
+			return -EIO;
+		}
+
+		if (alt) {
+			dput(grave);
+			grave = alt;
+		}
+	}
+
+	if (grave->d_inode) {
+		unlock_rename(cache->graveyard, dir);
+		dput(grave);
+		grave = NULL;
+		cond_resched();
+		goto try_again;
+	}
+
+	if (d_mountpoint(grave)) {
+		unlock_rename(cache->graveyard, dir);
+		dput(grave);
+		cachefiles_io_error(cache, "Mountpoint in graveyard");
+		return -EIO;
+	}
+
+	/* target should not be an ancestor of source */
+	if (trap == grave) {
+		unlock_rename(cache->graveyard, dir);
+		dput(grave);
+		cachefiles_io_error(cache, "May not make directory loop");
+		return -EIO;
+	}
+
+	/* attempt the rename */
+	DQUOT_INIT(dir->d_inode);
+	DQUOT_INIT(cache->graveyard->d_inode);
+
+	old_name = fsnotify_oldname_init(rep->d_name.name);
+
+	ret = dir->d_inode->i_op->rename(dir->d_inode, rep,
+					 cache->graveyard->d_inode, grave);
+
+	if (ret == 0) {
+		d_move(rep, grave);
+		fsnotify_move(dir->d_inode, cache->graveyard->d_inode,
+			      old_name, rep->d_name.name, 1,
+			      grave->d_inode, rep->d_inode);
+	} else if (ret != -ENOMEM) {
+		cachefiles_io_error(cache, "Rename failed with error %d", ret);
+	}
+
+	fsnotify_oldname_free(old_name);
+
+	unlock_rename(cache->graveyard, dir);
+	dput(grave);
+	_leave(" = 0");
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * delete an object representation from the cache
+ */
+int cachefiles_delete_object(struct cachefiles_cache *cache,
+			     struct cachefiles_object *object)
+{
+	struct dentry *dir;
+	int ret;
+
+	_enter(",{%p}", object->dentry);
+
+	ASSERT(object->dentry);
+	ASSERT(object->dentry->d_inode);
+	ASSERT(object->dentry->d_parent);
+
+	dir = dget_parent(object->dentry);
+
+	mutex_lock(&dir->d_inode->i_mutex);
+	ret = cachefiles_bury_object(cache, dir, object->dentry);
+
+	dput(dir);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+/*****************************************************************************/
+/*
+ * walk from the parent object to the child object through the backing
+ * filesystem, creating directories as we go
+ */
+int cachefiles_walk_to_object(struct cachefiles_object *parent,
+			      struct cachefiles_object *object,
+			      char *key,
+			      struct cachefiles_xattr *auxdata)
+{
+	struct cachefiles_cache *cache;
+	struct dentry *dir, *next = NULL, *new;
+	struct qstr name;
+	uid_t fsuid;
+	gid_t fsgid;
+	int ret;
+
+	_enter("{%p}", parent->dentry);
+
+	cache = container_of(parent->fscache.cache,
+			     struct cachefiles_cache, cache);
+
+	ASSERT(parent->dentry);
+	ASSERT(parent->dentry->d_inode);
+
+	if (!(S_ISDIR(parent->dentry->d_inode->i_mode))) {
+		// TODO: convert file to dir
+		_leave("looking up in none directory");
+		return -ENOBUFS;
+	}
+
+	fsuid = current->fsuid;
+	fsgid = current->fsgid;
+	current->fsuid = 0;
+	current->fsgid = 0;
+
+	dir = dget(parent->dentry);
+
+advance:
+	/* attempt to transit the first directory component */
+	name.name = key;
+	key = strchr(key, '/');
+	if (key) {
+		name.len = key - (char *) name.name;
+		*key++ = 0;
+	} else {
+		name.len = strlen(name.name);
+	}
+
+	/* hash the name */
+	name.hash = full_name_hash(name.name, name.len);
+
+	if (dir->d_op && dir->d_op->d_hash) {
+		ret = dir->d_op->d_hash(dir, &name);
+		if (ret < 0) {
+			cachefiles_io_error(cache, "Hash failed");
+			goto error_out2;
+		}
+	}
+
+lookup_again:
+	/* search the current directory for the element name */
+	_debug("lookup '%s' %x", name.name, name.hash);
+
+	mutex_lock(&dir->d_inode->i_mutex);
+
+	next = d_lookup(dir, &name);
+	if (!next) {
+		_debug("not found");
+
+		new = d_alloc(dir, &name);
+		if (!new)
+			goto nomem_d_alloc;
+
+		ASSERT(dir->d_inode->i_op);
+		ASSERT(dir->d_inode->i_op->lookup);
+
+		next = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
+		if (IS_ERR(next))
+			goto lookup_error;
+
+		if (!next)
+			next = new;
+		else
+			dput(new);
+
+		if (next->d_inode) {
+			ret = -EPERM;
+			if (!next->d_inode->i_op ||
+			    !next->d_inode->i_op->setxattr ||
+			    !next->d_inode->i_op->getxattr ||
+			    !next->d_inode->i_op->removexattr)
+				goto error;
+
+			if (key && (!next->d_inode->i_op->lookup ||
+				    !next->d_inode->i_op->mkdir ||
+				    !next->d_inode->i_op->create ||
+				    !next->d_inode->i_op->rename ||
+				    !next->d_inode->i_op->rmdir ||
+				    !next->d_inode->i_op->unlink))
+				goto error;
+		}
+	}
+
+	_debug("next -> %p %s", next, next->d_inode ? "positive" : "negative");
+
+	if (!key)
+		object->new = !next->d_inode;
+
+	/* we need to create the object if it's negative */
+	if (key || object->type == FSCACHE_COOKIE_TYPE_INDEX) {
+		/* index objects and intervening tree levels must be subdirs */
+		if (!next->d_inode) {
+			DQUOT_INIT(dir->d_inode);
+			ret = dir->d_inode->i_op->mkdir(dir->d_inode, next, 0);
+			if (ret < 0)
+				goto create_error;
+
+			ASSERT(next->d_inode);
+
+			fsnotify_mkdir(dir->d_inode, next);
+
+			_debug("mkdir -> %p{%p{ino=%lu}}",
+			       next, next->d_inode, next->d_inode->i_ino);
+
+		} else if (!S_ISDIR(next->d_inode->i_mode)) {
+			kerror("inode %lu is not a directory",
+			       next->d_inode->i_ino);
+			ret = -ENOBUFS;
+			goto error;
+		}
+
+	} else {
+		/* non-index objects start out life as files */
+		if (!next->d_inode) {
+			DQUOT_INIT(dir->d_inode);
+			ret = dir->d_inode->i_op->create(dir->d_inode, next,
+							 S_IFREG, NULL);
+			if (ret < 0)
+				goto create_error;
+
+			ASSERT(next->d_inode);
+
+			fsnotify_create(dir->d_inode, next);
+
+			_debug("create -> %p{%p{ino=%lu}}",
+			       next, next->d_inode, next->d_inode->i_ino);
+
+		} else if (!S_ISDIR(next->d_inode->i_mode) &&
+			   !S_ISREG(next->d_inode->i_mode)
+			   ) {
+			kerror("inode %lu is not a file or directory",
+			       next->d_inode->i_ino);
+			ret = -ENOBUFS;
+			goto error;
+		}
+	}
+
+	/* process the next component */
+	if (key) {
+		_debug("advance");
+		mutex_unlock(&dir->d_inode->i_mutex);
+		dput(dir);
+		dir = next;
+		next = NULL;
+		goto advance;
+	}
+
+	/* we've found the object we were looking for */
+	object->dentry = next;
+
+	/* if we've found that the terminal object exists, then we need to
+	 * check its attributes and delete it if it's out of date */
+	if (!object->new) {
+		_debug("validate '%*.*s'",
+		       next->d_name.len, next->d_name.len, next->d_name.name);
+
+		ret = cachefiles_check_object_xattr(object, auxdata);
+		if (ret == -ESTALE) {
+			/* delete the object (the deleter drops the directory
+			 * mutex) */
+			object->dentry = NULL;
+
+			ret = cachefiles_bury_object(cache, dir, next);
+			dput(next);
+			next = NULL;
+
+			if (ret < 0)
+				goto delete_error;
+
+			_debug("redo lookup");
+			goto lookup_again;
+		}
+	}
+
+	/* note that we're now using this object */
+	cachefiles_mark_object_active(cache, object);
+
+	mutex_unlock(&dir->d_inode->i_mutex);
+	dput(dir);
+	dir = NULL;
+
+	if (object->new) {
+		/* attach data to a newly constructed terminal object */
+		ret = cachefiles_set_object_xattr(object, auxdata);
+		if (ret < 0)
+			goto check_error;
+	} else {
+		/* always update the atime on an object we've just looked up
+		 * (this is used to keep track of culling, and atimes are only
+		 * updated by read, write and readdir but not lookup or
+		 * open) */
+		touch_atime(cache->mnt, next);
+	}
+
+	/* open a file interface onto a data file */
+	if (object->type != FSCACHE_COOKIE_TYPE_INDEX) {
+		if (S_ISREG(object->dentry->d_inode->i_mode)) {
+			const struct address_space_operations *aops;
+
+			ret = -EPERM;
+			aops = object->dentry->d_inode->i_mapping->a_ops;
+			if (!aops->bmap ||
+			    !aops->prepare_write ||
+			    !aops->commit_write)
+				goto check_error;
+
+			object->backer = object->dentry;
+		} else {
+			BUG(); // TODO: open file in data-class subdir
+		}
+	}
+
+	current->fsuid = fsuid;
+	current->fsgid = fsgid;
+	object->new = 0;
+
+	_leave(" = 0 [%lu]", object->dentry->d_inode->i_ino);
+	return 0;
+
+create_error:
+	if (ret == -EIO)
+		cachefiles_io_error(cache, "create/mkdir failed");
+	goto error;
+
+check_error:
+	write_lock(&cache->active_lock);
+	rb_erase(&object->active_node, &cache->active_nodes);
+	write_unlock(&cache->active_lock);
+
+	dput(object->dentry);
+	object->dentry = NULL;
+	goto error_out;
+
+delete_error:
+	_debug("delete error %d", ret);
+	goto error_out2;
+
+lookup_error:
+	_debug("lookup error %ld", PTR_ERR(next));
+	dput(new);
+	ret = PTR_ERR(next);
+	if (ret == -EIO)
+		cachefiles_io_error(cache, "Lookup failed");
+	next = NULL;
+	goto error;
+
+nomem_d_alloc:
+	ret = -ENOMEM;
+error:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	dput(next);
+error_out2:
+	dput(dir);
+error_out:
+	current->fsuid = fsuid;
+	current->fsgid = fsgid;
+
+	_leave(" = ret");
+	return ret;
+}
+
+/*****************************************************************************/
+/*
+ * get a subdirectory
+ */
+struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+					struct dentry *dir,
+					const char *dirname)
+{
+	struct dentry *subdir, *new;
+	struct qstr name;
+	uid_t fsuid;
+	gid_t fsgid;
+	int ret;
+
+	_enter("");
+
+	/* set up the name */
+	name.name = dirname;
+	name.len = strlen(dirname);
+	name.hash = full_name_hash(name.name, name.len);
+
+	if (dir->d_op && dir->d_op->d_hash) {
+		ret = dir->d_op->d_hash(dir, &name);
+		if (ret < 0) {
+			if (ret == -EIO)
+				kerror("Hash failed");
+			_leave(" = %d", ret);
+			return ERR_PTR(ret);
+		}
+	}
+
+	/* search the current directory for the element name */
+	_debug("lookup '%s' %x", name.name, name.hash);
+
+	fsuid = current->fsuid;
+	fsgid = current->fsgid;
+	current->fsuid = 0;
+	current->fsgid = 0;
+
+	mutex_lock(&dir->d_inode->i_mutex);
+
+	subdir = d_lookup(dir, &name);
+	if (!subdir) {
+		_debug("not found");
+
+		new = d_alloc(dir, &name);
+		if (!new)
+			goto nomem_d_alloc;
+
+		subdir = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
+		if (IS_ERR(subdir))
+			goto lookup_error;
+
+		if (!subdir)
+			subdir = new;
+		else
+			dput(new);
+	}
+
+	_debug("subdir -> %p %s",
+	       subdir, subdir->d_inode ? "positive" : "negative");
+
+	/* we need to create the subdir if it doesn't exist yet */
+	if (!subdir->d_inode) {
+		DQUOT_INIT(dir->d_inode);
+		ret = dir->d_inode->i_op->mkdir(dir->d_inode, subdir, 0700);
+		if (ret < 0)
+			goto mkdir_error;
+
+		ASSERT(subdir->d_inode);
+
+		fsnotify_mkdir(dir->d_inode, subdir);
+
+		_debug("mkdir -> %p{%p{ino=%lu}}",
+		       subdir,
+		       subdir->d_inode,
+		       subdir->d_inode->i_ino);
+	}
+
+	mutex_unlock(&dir->d_inode->i_mutex);
+
+	current->fsuid = fsuid;
+	current->fsgid = fsgid;
+
+	/* we need to make sure the subdir is a directory */
+	ASSERT(subdir->d_inode);
+
+	if (!S_ISDIR(subdir->d_inode->i_mode)) {
+		kerror("%s is not a directory", dirname);
+		ret = -EIO;
+		goto check_error;
+	}
+
+	ret = -EPERM;
+	if (!subdir->d_inode->i_op ||
+	    !subdir->d_inode->i_op->setxattr ||
+	    !subdir->d_inode->i_op->getxattr ||
+	    !subdir->d_inode->i_op->lookup ||
+	    !subdir->d_inode->i_op->mkdir ||
+	    !subdir->d_inode->i_op->create ||
+	    !subdir->d_inode->i_op->rename ||
+	    !subdir->d_inode->i_op->rmdir ||
+	    !subdir->d_inode->i_op->unlink)
+		goto check_error;
+
+	_leave(" = [%lu]", subdir->d_inode->i_ino);
+	return subdir;
+
+check_error:
+	dput(subdir);
+	_leave(" = %d [check]", ret);
+	return ERR_PTR(ret);
+
+mkdir_error:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	kerror("mkdir %s failed with error %d", dirname, ret);
+	goto error_out;
+
+lookup_error:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	dput(new);
+	ret = PTR_ERR(subdir);
+	kerror("Lookup %s failed with error %d", dirname, ret);
+	goto error_out;
+
+nomem_d_alloc:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	ret = -ENOMEM;
+	goto error_out;
+
+error_out:
+	current->fsuid = fsuid;
+	current->fsgid = fsgid;
+	_leave(" = %d", ret);
+	return ERR_PTR(ret);
+}
+
+/*****************************************************************************/
+/*
+ * cull an object if it's not in use
+ * - called only by cache manager daemon
+ */
+int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+		    char *filename)
+{
+	struct cachefiles_object *object;
+	struct rb_node *_n;
+	struct dentry *victim, *new;
+	struct qstr name;
+	int ret;
+
+	_enter(",%*.*s/,%s",
+	       dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+	/* set up the name */
+	name.name = filename;
+	name.len = strlen(filename);
+	name.hash = full_name_hash(name.name, name.len);
+
+	if (dir->d_op && dir->d_op->d_hash) {
+		ret = dir->d_op->d_hash(dir, &name);
+		if (ret < 0) {
+			if (ret == -EIO)
+				cachefiles_io_error(cache, "Hash failed");
+			_leave(" = %d", ret);
+			return ret;
+		}
+	}
+
+	/* look up the victim */
+	mutex_lock(&dir->d_inode->i_mutex);
+
+	victim = d_lookup(dir, &name);
+	if (!victim) {
+		_debug("not found");
+
+		new = d_alloc(dir, &name);
+		if (!new)
+			goto nomem_d_alloc;
+
+		victim = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
+		if (IS_ERR(victim))
+			goto lookup_error;
+
+		if (!victim)
+			victim = new;
+		else
+			dput(new);
+	}
+
+	_debug("victim -> %p %s",
+	       victim, victim->d_inode ? "positive" : "negative");
+
+	/* if the object is no longer there then we probably retired the object
+	 * at the netfs's request whilst the cull was in progress
+	 */
+	if (!victim->d_inode) {
+		mutex_unlock(&dir->d_inode->i_mutex);
+		dput(victim);
+		_leave(" = -ENOENT [absent]");
+		return -ENOENT;
+	}
+
+	/* check to see if we're using this object */
+	read_lock(&cache->active_lock);
+
+	_n = cache->active_nodes.rb_node;
+
+	while (_n) {
+		object = rb_entry(_n, struct cachefiles_object, active_node);
+
+		if (object->dentry > victim)
+			_n = _n->rb_left;
+		else if (object->dentry < victim)
+			_n = _n->rb_right;
+		else
+			goto object_in_use;
+	}
+
+	read_unlock(&cache->active_lock);
+
+	/* okay... the victim is not being used so we can cull it
+	 * - start by marking it as stale
+	 */
+	_debug("victim is cullable");
+
+	ret = cachefiles_remove_object_xattr(cache, victim);
+	if (ret < 0)
+		goto error_unlock;
+
+	/*  actually remove the victim (drops the dir mutex) */
+	_debug("bury");
+
+	ret = cachefiles_bury_object(cache, dir, victim);
+	if (ret < 0)
+		goto error;
+
+	dput(victim);
+	_leave(" = 0");
+	return 0;
+
+
+object_in_use:
+	read_unlock(&cache->active_lock);
+	mutex_unlock(&dir->d_inode->i_mutex);
+	dput(victim);
+	_leave(" = -EBUSY [in use]");
+	return -EBUSY;
+
+nomem_d_alloc:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	_leave(" = -ENOMEM");
+	return -ENOMEM;
+
+lookup_error:
+	mutex_unlock(&dir->d_inode->i_mutex);
+	dput(new);
+	ret = PTR_ERR(victim);
+	if (ret == -EIO)
+		cachefiles_io_error(cache, "Lookup failed");
+	goto choose_error;
+
+error_unlock:
+	mutex_unlock(&dir->d_inode->i_mutex);
+error:
+	dput(victim);
+choose_error:
+	if (ret == -ENOENT) {
+		/* file or dir now absent - probably retired by netfs */
+		_leave(" = -ESTALE [absent]");
+		return -ESTALE;
+	}
+
+	if (ret != -ENOMEM) {
+		kerror("Internal error: %d", ret);
+		ret = -EIO;
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+}
diff --git a/fs/cachefiles/cf-proc.c b/fs/cachefiles/cf-proc.c
new file mode 100644
index 0000000..a4a056b
--- /dev/null
+++ b/fs/cachefiles/cf-proc.c
@@ -0,0 +1,498 @@
+/* cf-proc.c: /proc/fs/cachefiles interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/namespace.h>
+#include <linux/statfs.h>
+#include <linux/proc_fs.h>
+#include <linux/ctype.h>
+#include "internal.h"
+
+static int cachefiles_proc_open(struct inode *, struct file *);
+static int cachefiles_proc_release(struct inode *, struct file *);
+static ssize_t cachefiles_proc_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t cachefiles_proc_write(struct file *, const char __user *, size_t, loff_t *);
+static int cachefiles_proc_brun(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_bcull(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_bstop(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_cull(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_debug(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_dir(struct cachefiles_cache *cache, char *args);
+static int cachefiles_proc_tag(struct cachefiles_cache *cache, char *args);
+
+struct proc_dir_entry *cachefiles_proc;
+
+static unsigned long cachefiles_open;
+
+struct file_operations cachefiles_proc_fops = {
+	.open		= cachefiles_proc_open,
+	.release	= cachefiles_proc_release,
+	.read		= cachefiles_proc_read,
+	.write		= cachefiles_proc_write,
+};
+
+struct cachefiles_proc_cmd {
+	char name[8];
+	int (*handler)(struct cachefiles_cache *cache, char *args);
+};
+
+static const struct cachefiles_proc_cmd cachefiles_proc_cmds[] = {
+	{ "bind",	cachefiles_proc_bind	},
+	{ "brun",	cachefiles_proc_brun	},
+	{ "bcull",	cachefiles_proc_bcull	},
+	{ "bstop",	cachefiles_proc_bstop	},
+	{ "cull",	cachefiles_proc_cull	},
+	{ "debug",	cachefiles_proc_debug	},
+	{ "dir",	cachefiles_proc_dir	},
+	{ "tag",	cachefiles_proc_tag	},
+	{ "",		NULL			}
+};
+
+
+/*****************************************************************************/
+/*
+ * do various checks
+ */
+static int cachefiles_proc_open(struct inode *inode, struct file *file)
+{
+	struct cachefiles_cache *cache;
+
+	_enter("");
+
+	/* only the superuser may do this */
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* /proc/fs/cachefiles may only be open once at a time */
+	if (xchg(&cachefiles_open, 1) == 1)
+		return -EBUSY;
+
+	/* allocate a cache record */
+	cache = kzalloc(sizeof(struct cachefiles_cache), GFP_KERNEL);
+	if (!cache) {
+		cachefiles_open = 0;
+		return -ENOMEM;
+	}
+
+	cache->active_nodes = RB_ROOT;
+	rwlock_init(&cache->active_lock);
+
+	/* set default caching limits
+	 * - limit at 1% free space
+	 * - cull below 5% free space
+	 * - cease culling above 7% free space
+	 */
+	cache->brun_percent = 7;
+	cache->bcull_percent = 5;
+	cache->bstop_percent = 1;
+
+	file->private_data = cache;
+	cache->cachefilesd = file;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * release a cache
+ */
+static int cachefiles_proc_release(struct inode *inode, struct file *file)
+{
+	struct cachefiles_cache *cache = file->private_data;
+
+	_enter("");
+
+	ASSERT(cache);
+
+	set_bit(CACHEFILES_DEAD, &cache->flags);
+
+	cachefiles_proc_unbind(cache);
+
+	ASSERT(!cache->active_nodes.rb_node);
+
+	/* clean up the control file interface */
+	cache->cachefilesd = NULL;
+	file->private_data = NULL;
+	cachefiles_open = 0;
+
+	kfree(cache);
+
+	_leave("");
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * read the cache state
+ */
+static ssize_t cachefiles_proc_read(struct file *file, char __user *_buffer,
+				    size_t buflen, loff_t *pos)
+{
+	struct cachefiles_cache *cache = file->private_data;
+	char buffer[256];
+	int n;
+
+	_enter(",,%zu,", buflen);
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags))
+		return 0;
+
+	/* check how much space the cache has */
+	cachefiles_has_space(cache, 0);
+
+	/* summarise */
+	n = snprintf(buffer, sizeof(buffer),
+		     "cull=%c"
+		     " brun=%llx"
+		     " bcull=%llx"
+		     " bstop=%llx",
+		     test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0',
+		     cache->brun,
+		     cache->bcull,
+		     cache->bstop
+		     );
+
+	if (n > buflen)
+		return -EMSGSIZE;
+
+	if (copy_to_user(_buffer, buffer, n) != 0)
+		return -EFAULT;
+
+	return n;
+}
+
+/*****************************************************************************/
+/*
+ * command the cache
+ */
+static ssize_t cachefiles_proc_write(struct file *file,
+				     const char __user *_data, size_t datalen,
+				     loff_t *pos)
+{
+	const struct cachefiles_proc_cmd *cmd;
+	struct cachefiles_cache *cache = file->private_data;
+	ssize_t ret;
+	char *data, *args, *cp;
+
+	_enter(",,%zu,", datalen);
+
+	ASSERT(cache);
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags))
+		return -EIO;
+
+	if (datalen < 0 || datalen > PAGE_SIZE - 1)
+		return -EOPNOTSUPP;
+
+	/* drag the command string into the kernel so we can parse it */
+	data = kmalloc(datalen + 1, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	ret = -EFAULT;
+	if (copy_from_user(data, _data, datalen) != 0)
+		goto error;
+
+	data[datalen] = '\0';
+
+	ret = -EINVAL;
+	if (memchr(data, '\0', datalen))
+		goto error;
+
+	/* strip any newline */
+	cp = memchr(data, '\n', datalen);
+	if (cp) {
+		if (cp == data)
+			goto error;
+
+		*cp = '\0';
+	}
+
+	/* parse the command */
+	ret = -EOPNOTSUPP;
+
+	for (args = data; *args; args++)
+		if (isspace(*args))
+			break;
+	if (*args) {
+		if (args == data)
+			goto error;
+		*args = '\0';
+		for (args++; isspace(*args); args++)
+			continue;
+	}
+
+	/* run the appropriate command handler */
+	for (cmd = cachefiles_proc_cmds; cmd->name[0]; cmd++)
+		if (strcmp(cmd->name, data) == 0)
+			goto found_command;
+
+error:
+	kfree(data);
+	_leave(" = %zd", ret);
+	return ret;
+
+found_command:
+	mutex_lock(&file->f_dentry->d_inode->i_mutex);
+
+	ret = -EIO;
+	if (!test_bit(CACHEFILES_DEAD, &cache->flags))
+		ret = cmd->handler(cache, args);
+
+	mutex_unlock(&file->f_dentry->d_inode->i_mutex);
+
+	if (ret == 0)
+		ret = datalen;
+	goto error;
+}
+
+/*****************************************************************************/
+/*
+ * give a range error for cache space constraints
+ * - can be tail-called
+ */
+static int cachefiles_proc_range_error(struct cachefiles_cache *cache, char *args)
+{
+	kerror("Free space limits must be in range"
+	       " 0%%<=bstop<bcull<brun<100%%");
+
+	return -EINVAL;
+}
+
+/*****************************************************************************/
+/*
+ * set the percentage of blocks at which to stop culling
+ * - command: "brun <N>%"
+ */
+static int cachefiles_proc_brun(struct cachefiles_cache *cache, char *args)
+{
+	unsigned long brun;
+
+	_enter(",%s", args);
+
+	if (!*args)
+		return -EINVAL;
+
+	brun = simple_strtoul(args, &args, 10);
+	if (args[0] != '%' || args[1] != '\0')
+		return -EINVAL;
+
+	if (brun <= cache->bcull_percent || brun >= 100)
+		return cachefiles_proc_range_error(cache, args);
+
+	cache->brun_percent = brun;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * set the percentage of blocks at which to start culling
+ * - command: "bcull <N>%"
+ */
+static int cachefiles_proc_bcull(struct cachefiles_cache *cache, char *args)
+{
+	unsigned long bcull;
+
+	_enter(",%s", args);
+
+	if (!*args)
+		return -EINVAL;
+
+	bcull = simple_strtoul(args, &args, 10);
+	if (args[0] != '%' || args[1] != '\0')
+		return -EINVAL;
+
+	if (bcull <= cache->bstop_percent || bcull >= cache->brun_percent)
+		return cachefiles_proc_range_error(cache, args);
+
+	cache->bcull_percent = bcull;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * set the percentage of blocks at which to stop allocating
+ * - command: "bstop <N>%"
+ */
+static int cachefiles_proc_bstop(struct cachefiles_cache *cache, char *args)
+{
+	unsigned long bstop;
+
+	_enter(",%s", args);
+
+	if (!*args)
+		return -EINVAL;
+
+	bstop = simple_strtoul(args, &args, 10);
+	if (args[0] != '%' || args[1] != '\0')
+		return -EINVAL;
+
+	if (bstop < 0 || bstop >= cache->bcull_percent)
+		return cachefiles_proc_range_error(cache, args);
+
+	cache->bstop_percent = bstop;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * set the cache directory
+ * - command: "dir <name>"
+ */
+static int cachefiles_proc_dir(struct cachefiles_cache *cache, char *args)
+{
+	char *dir;
+
+	_enter(",%s", args);
+
+	if (!*args) {
+		kerror("Empty directory specified");
+		return -EINVAL;
+	}
+
+	if (cache->rootdirname) {
+		kerror("Second cache directory specified");
+		return -EEXIST;
+	}
+
+	dir = kstrdup(args, GFP_KERNEL);
+	if (!dir)
+		return -ENOMEM;
+
+	cache->rootdirname = dir;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * set the cache tag
+ * - command: "tag <name>"
+ */
+static int cachefiles_proc_tag(struct cachefiles_cache *cache, char *args)
+{
+	char *tag;
+
+	_enter(",%s", args);
+
+	if (!*args) {
+		kerror("Empty tag specified");
+		return -EINVAL;
+	}
+
+	if (cache->tag)
+		return -EEXIST;
+
+	tag = kstrdup(args, GFP_KERNEL);
+	if (!tag)
+		return -ENOMEM;
+
+	cache->tag = tag;
+	return 0;
+}
+
+/*****************************************************************************/
+/*
+ * request a node in the cache be culled
+ * - command: "cull <dirfd> <name>"
+ */
+static int cachefiles_proc_cull(struct cachefiles_cache *cache, char *args)
+{
+	struct dentry *dir;
+	struct file *dirfile;
+	int dirfd, fput_needed, ret;
+
+	_enter(",%s", args);
+
+	dirfd = simple_strtoul(args, &args, 0);
+
+	if (!args || !isspace(*args))
+		goto inval;
+
+	while (isspace(*args))
+		args++;
+
+	if (!*args)
+		goto inval;
+
+	if (strchr(args, '/'))
+		goto inval;
+
+	if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+		kerror("cull applied to unready cache");
+		return -EIO;
+	}
+
+	if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+		kerror("cull applied to dead cache");
+		return -EIO;
+	}
+
+	/* extract the directory dentry from the fd */
+	dirfile = fget_light(dirfd, &fput_needed);
+	if (!dirfile) {
+		kerror("cull dirfd not open");
+		return -EBADF;
+	}
+
+	dir = dget(dirfile->f_dentry);
+	fput_light(dirfile, fput_needed);
+	dirfile = NULL;
+
+	if (!S_ISDIR(dir->d_inode->i_mode))
+		goto notdir;
+
+	ret = cachefiles_cull(cache, dir, args);
+
+	dput(dir);
+	_leave(" = %d", ret);
+	return ret;
+
+notdir:
+	dput(dir);
+	kerror("cull command requires dirfd to be a directory");
+	return -ENOTDIR;
+
+inval:
+	kerror("cull command requires dirfd and filename");
+	return -EINVAL;
+}
+
+/*****************************************************************************/
+/*
+ * set debugging mode
+ * - command: "debug <mask>"
+ */
+static int cachefiles_proc_debug(struct cachefiles_cache *cache, char *args)
+{
+	unsigned long mask;
+
+	_enter(",%s", args);
+
+	mask = simple_strtoul(args, &args, 0);
+
+	if (!args || !isspace(*args))
+		goto inval;
+
+	cachefiles_debug = mask;
+	_leave(" = 0");
+	return 0;
+
+inval:
+	kerror("debug command requires mask");
+	return -EINVAL;
+}
diff --git a/fs/cachefiles/cf-sysctl.c b/fs/cachefiles/cf-sysctl.c
new file mode 100644
index 0000000..4d6fc04
--- /dev/null
+++ b/fs/cachefiles/cf-sysctl.c
@@ -0,0 +1,69 @@
+/* cf-sysctl.c: Control parameters
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sysctl.h>
+#include "internal.h"
+
+static struct ctl_table_header *cachefiles_sysctl;
+
+/*
+ * Something that isn't CTL_ANY, CTL_NONE or a value that may clash.
+ * Use the same values as fs/nfs/sysctl.c
+ */
+#define CTL_UNNUMBERED -2
+
+static ctl_table cachefiles_sysctl_table[] = {
+        {
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "debug",
+		.data		= &cachefiles_debug,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= &proc_doulongvec_minmax
+	},
+	{ .ctl_name = 0 }
+};
+
+static ctl_table cachefiles_sysctl_dir[] = {
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "cachefiles",
+		.maxlen		= 0,
+		.mode		= 0555,
+		.child		= cachefiles_sysctl_table
+	},
+	{ .ctl_name = 0 }
+};
+
+static ctl_table cachefiles_sysctl_root[] = {
+	{
+		.ctl_name = CTL_FS,
+		.procname = "fs",
+		.mode = 0555,
+		.child = cachefiles_sysctl_dir,
+	},
+	{ .ctl_name = 0 }
+};
+
+int __init cachefiles_sysctl_init(void)
+{
+	cachefiles_sysctl = register_sysctl_table(cachefiles_sysctl_root, 0);
+	if (!cachefiles_sysctl)
+		return -ENOMEM;
+	return 0;
+}
+
+void __exit cachefiles_sysctl_cleanup(void)
+{
+	unregister_sysctl_table(cachefiles_sysctl);
+}
diff --git a/fs/cachefiles/cf-xattr.c b/fs/cachefiles/cf-xattr.c
new file mode 100644
index 0000000..8952bf0
--- /dev/null
+++ b/fs/cachefiles/cf-xattr.c
@@ -0,0 +1,295 @@
+/* cf-xattr.c: CacheFiles extended attribute management
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/quotaops.h>
+#include <linux/xattr.h>
+#include "internal.h"
+
+static const char cachefiles_xattr_cache[] = XATTR_USER_PREFIX "CacheFiles.cache";
+
+/*****************************************************************************/
+/*
+ * check the type label on an object
+ * - done using xattrs
+ */
+int cachefiles_check_object_type(struct cachefiles_object *object)
+{
+	struct dentry *dentry = object->dentry;
+	char type[3], xtype[3];
+	int ret;
+
+	ASSERT(dentry);
+	ASSERT(dentry->d_inode);
+	ASSERT(dentry->d_inode->i_op);
+	ASSERT(dentry->d_inode->i_op->setxattr);
+	ASSERT(dentry->d_inode->i_op->getxattr);
+
+	if (!object->fscache.cookie)
+		strcpy(type, "C3");
+	else
+		snprintf(type, 3, "%02x", object->fscache.cookie->def->type);
+
+	_enter("%p{%s}", object, type);
+
+	mutex_lock(&dentry->d_inode->i_mutex);
+
+	/* attempt to install a type label directly */
+	ret = dentry->d_inode->i_op->setxattr(dentry, cachefiles_xattr_cache,
+					      type, 2, XATTR_CREATE);
+	if (ret == 0) {
+		_debug("SET");
+		fsnotify_xattr(dentry);
+		mutex_unlock(&dentry->d_inode->i_mutex);
+		goto error;
+	}
+
+	if (ret != -EEXIST) {
+		kerror("Can't set xattr on %*.*s [%lu] (err %d)",
+		       dentry->d_name.len, dentry->d_name.len,
+		       dentry->d_name.name, dentry->d_inode->i_ino,
+		       -ret);
+		goto error;
+	}
+
+	/* read the current type label */
+	ret = dentry->d_inode->i_op->getxattr(dentry, cachefiles_xattr_cache,
+					      xtype, 3);
+	if (ret < 0) {
+		if (ret == -ERANGE)
+			goto bad_type_length;
+
+		kerror("Can't read xattr on %*.*s [%lu] (err %d)",
+		       dentry->d_name.len, dentry->d_name.len,
+		       dentry->d_name.name, dentry->d_inode->i_ino,
+		       -ret);
+		goto error;
+	}
+
+	/* check the type is what we're expecting */
+	if (ret != 2)
+		goto bad_type_length;
+
+	if (xtype[0] != type[0] || xtype[1] != type[1])
+		goto bad_type;
+
+	ret = 0;
+
+error:
+	mutex_unlock(&dentry->d_inode->i_mutex);
+	_leave(" = %d", ret);
+	return ret;
+
+bad_type_length:
+	kerror("Cache object %lu type xattr length incorrect",
+	       dentry->d_inode->i_ino);
+	ret = -EIO;
+	goto error;
+
+bad_type:
+	xtype[2] = 0;
+	kerror("Cache object %*.*s [%lu] type %s not %s",
+	       dentry->d_name.len, dentry->d_name.len,
+	       dentry->d_name.name, dentry->d_inode->i_ino,
+	       xtype, type);
+	ret = -EIO;
+	goto error;
+}
+
+/*****************************************************************************/
+/*
+ * set the state xattr on a cache file
+ */
+int cachefiles_set_object_xattr(struct cachefiles_object *object,
+				struct cachefiles_xattr *auxdata)
+{
+	struct dentry *dentry = object->dentry;
+	int ret;
+
+	ASSERT(object->fscache.cookie);
+	ASSERT(dentry);
+	ASSERT(dentry->d_inode->i_op->setxattr);
+
+	_enter("%p,#%d", object, auxdata->len);
+
+	/* attempt to install the cache metadata directly */
+	mutex_lock(&dentry->d_inode->i_mutex);
+
+	_debug("SET %s #%u",
+	       object->fscache.cookie->def->name, auxdata->len);
+
+	ret = dentry->d_inode->i_op->setxattr(dentry, cachefiles_xattr_cache,
+					      &auxdata->type, auxdata->len,
+					      XATTR_CREATE);
+	if (ret == 0)
+		fsnotify_xattr(dentry);
+	else if (ret != -ENOMEM)
+		cachefiles_io_error_obj(object,
+					"Failed to set xattr with error %d",
+					ret);
+
+	mutex_unlock(&dentry->d_inode->i_mutex);
+	_leave(" = %d", ret);
+	return ret;
+}
+
+/*****************************************************************************/
+/*
+ * check the state xattr on a cache file
+ * - return -ESTALE if the object should be deleted
+ */
+int cachefiles_check_object_xattr(struct cachefiles_object *object,
+				  struct cachefiles_xattr *auxdata)
+{
+	struct cachefiles_xattr *auxbuf;
+	struct dentry *dentry = object->dentry;
+	int ret;
+
+	_enter("%p,#%d", object, auxdata->len);
+
+	ASSERT(dentry);
+	ASSERT(dentry->d_inode);
+	ASSERT(dentry->d_inode->i_op->setxattr);
+	ASSERT(dentry->d_inode->i_op->getxattr);
+
+	auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
+	if (!auxbuf) {
+		_leave(" = -ENOMEM");
+		return -ENOMEM;
+	}
+
+	mutex_lock(&dentry->d_inode->i_mutex);
+
+	/* read the current type label */
+	ret = dentry->d_inode->i_op->getxattr(dentry, cachefiles_xattr_cache,
+					      &auxbuf->type, 512 + 1);
+	if (ret < 0) {
+		if (ret == -ENODATA)
+			goto stale; /* no attribute - power went off
+				     * mid-cull? */
+
+		if (ret == -ERANGE)
+			goto bad_type_length;
+
+		cachefiles_io_error_obj(object,
+					"can't read xattr on %lu (err %d)",
+					dentry->d_inode->i_ino, -ret);
+		goto error;
+	}
+
+	/* check the on-disk object */
+	if (ret < 1)
+		goto bad_type_length;
+
+	if (auxbuf->type != auxdata->type)
+		goto stale;
+
+	auxbuf->len = ret;
+
+	/* consult the netfs */
+	if (object->fscache.cookie->def->check_aux) {
+		fscache_checkaux_t result;
+		unsigned int dlen;
+
+		dlen = auxbuf->len - 1;
+
+		_debug("checkaux %s #%u",
+		       object->fscache.cookie->def->name, dlen);
+
+		result = object->fscache.cookie->def->check_aux(
+			object->fscache.cookie->netfs_data,
+			&auxbuf->data, dlen);
+
+		switch (result) {
+			/* entry okay as is */
+		case FSCACHE_CHECKAUX_OKAY:
+			goto okay;
+
+			/* entry requires update */
+		case FSCACHE_CHECKAUX_NEEDS_UPDATE:
+			break;
+
+			/* entry requires deletion */
+		case FSCACHE_CHECKAUX_OBSOLETE:
+			goto stale;
+
+		default:
+			BUG();
+		}
+
+		/* update the current label */
+		ret = dentry->d_inode->i_op->setxattr(dentry,
+						      cachefiles_xattr_cache,
+						      &auxdata->type,
+						      auxdata->len,
+						      XATTR_REPLACE);
+		if (ret < 0) {
+			cachefiles_io_error_obj(object,
+						"Can't update xattr on %lu"
+						" (error %d)",
+						dentry->d_inode->i_ino, -ret);
+			goto error;
+		}
+	}
+
+okay:
+	ret = 0;
+
+error:
+	mutex_unlock(&dentry->d_inode->i_mutex);
+	kfree(auxbuf);
+	_leave(" = %d", ret);
+	return ret;
+
+bad_type_length:
+	kerror("Cache object %lu xattr length incorrect",
+	       dentry->d_inode->i_ino);
+	ret = -EIO;
+	goto error;
+
+stale:
+	ret = -ESTALE;
+	goto error;
+}
+
+/*****************************************************************************/
+/*
+ * remove the object's xattr to mark it stale
+ */
+int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+				   struct dentry *dentry)
+{
+	int ret;
+
+	mutex_lock(&dentry->d_inode->i_mutex);
+
+	ret = dentry->d_inode->i_op->removexattr(dentry,
+						 cachefiles_xattr_cache);
+
+	mutex_unlock(&dentry->d_inode->i_mutex);
+
+	if (ret < 0) {
+		if (ret == -ENOENT || ret == -ENODATA)
+			ret = 0;
+		else if (ret != -ENOMEM)
+			cachefiles_io_error(cache,
+					    "Can't remove xattr from %lu"
+					    " (error %d)",
+					    dentry->d_inode->i_ino, -ret);
+	}
+
+	_leave(" = %d", ret);
+	return ret;
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
new file mode 100644
index 0000000..edda6e7
--- /dev/null
+++ b/fs/cachefiles/internal.h
@@ -0,0 +1,308 @@
+/* internal.h: general netfs cache on cache files internal defs
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ *
+ * CacheFiles layout:
+ *
+ *	/..../CacheDir/
+ *		index
+ *		0/
+ *		1/
+ *		2/
+ *		  index
+ *		  0/
+ *		  1/
+ *		  2/
+ *		    index
+ *		    0
+ *		    1
+ *		    2
+ */
+
+#include <linux/fscache-cache.h>
+#include <linux/timer.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+struct cachefiles_cache;
+struct cachefiles_object;
+
+extern unsigned long cachefiles_debug;
+#define CACHEFILES_DEBUG_KENTER	1
+#define CACHEFILES_DEBUG_KLEAVE	2
+#define CACHEFILES_DEBUG_KDEBUG	4
+
+extern struct fscache_cache_ops cachefiles_cache_ops;
+extern struct proc_dir_entry *cachefiles_proc;
+extern struct file_operations cachefiles_proc_fops;
+
+/*****************************************************************************/
+/*
+ * node records
+ */
+struct cachefiles_object {
+	struct fscache_object		fscache;	/* fscache handle */
+	struct dentry			*dentry;	/* the file/dir representing this object */
+	struct dentry			*backer;	/* backing file */
+	loff_t				i_size;		/* object size */
+	atomic_t			usage;		/* basic object usage count */
+	atomic_t			fscache_usage;	/* FSDEF object usage count */
+	uint8_t				type;		/* object type */
+	uint8_t				new;		/* T if object new */
+	spinlock_t			work_lock;
+	struct rw_semaphore		sem;
+	struct work_struct		read_work;	/* read page copier */
+	struct list_head		read_list;	/* pages to copy */
+	struct list_head		read_pend_list;	/* pages to pending read from backer */
+	struct work_struct		write_work;	/* page writer */
+	struct list_head		write_list;	/* pages to store */
+	struct rb_node			active_node;	/* link in active tree (dentry is key) */
+};
+
+extern kmem_cache_t *cachefiles_object_jar;
+
+/*****************************************************************************/
+/*
+ * Cache files cache definition
+ */
+struct cachefiles_cache {
+	struct fscache_cache		cache;		/* FS-Cache record */
+	struct vfsmount			*mnt;		/* mountpoint holding the cache */
+	struct dentry			*graveyard;	/* directory into which dead objects go */
+	struct file			*cachefilesd;	/* manager daemon handle */
+	struct rb_root			active_nodes;	/* active nodes (can't be culled) */
+	rwlock_t			active_lock;	/* lock for active_nodes */
+	atomic_t			gravecounter;	/* graveyard uniquifier */
+	unsigned			brun_percent;	/* when to stop culling (%) */
+	unsigned			bcull_percent;	/* when to start culling (%) */
+	unsigned			bstop_percent;	/* when to stop allocating (%) */
+	unsigned			bsize;		/* cache's block size */
+	unsigned			bshift;		/* min(log2 (PAGE_SIZE / bsize), 0) */
+	sector_t			brun;		/* when to stop culling */
+	sector_t			bcull;		/* when to start culling */
+	sector_t			bstop;		/* when to stop allocating */
+	unsigned long			flags;
+#define CACHEFILES_READY		0	/* T if cache prepared */
+#define CACHEFILES_DEAD			1	/* T if cache dead */
+#define CACHEFILES_CULLING		2	/* T if cull engaged */
+	char				*rootdirname;	/* name of cache root directory */
+	char				*tag;		/* cache binding tag */
+};
+
+/*****************************************************************************/
+/*
+ * backing file read tracking
+ */
+struct cachefiles_one_read {
+	wait_queue_t			monitor;	/* link into monitored waitqueue */
+	struct page			*back_page;	/* backing file page we're waiting for */
+	struct page			*netfs_page;	/* netfs page we're going to fill */
+	struct cachefiles_object	*object;
+	struct list_head		obj_link;	/* link in object's lists */
+	fscache_rw_complete_t		end_io_func;
+	void				*context;
+};
+
+/*****************************************************************************/
+/*
+ * backing file write tracking
+ */
+struct cachefiles_one_write {
+	struct page			*netfs_page;	/* netfs page to copy */
+	struct cachefiles_object	*object;
+	struct list_head		obj_link;	/* link in object's lists */
+	fscache_rw_complete_t		end_io_func;
+	void				*context;
+};
+
+/*****************************************************************************/
+/*
+ * auxiliary data xattr buffer
+ */
+struct cachefiles_xattr {
+	uint16_t			len;
+	uint8_t				type;
+	uint8_t				data[];
+};
+
+
+/* cf-bind.c */
+extern int cachefiles_proc_bind(struct cachefiles_cache *cache, char *args);
+extern void cachefiles_proc_unbind(struct cachefiles_cache *cache);
+
+/* cf-interface.c */
+extern void cachefiles_read_copier_work(void *_object);
+extern void cachefiles_write_work(void *_object);
+extern int cachefiles_has_space(struct cachefiles_cache *cache, unsigned nr);
+
+/* cf-key.c */
+extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type);
+
+/* cf-namei.c */
+extern int cachefiles_delete_object(struct cachefiles_cache *cache,
+				    struct cachefiles_object *object);
+extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
+				     struct cachefiles_object *object,
+				     char *key,
+				     struct cachefiles_xattr *auxdata);
+extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+					       struct dentry *dir,
+					       const char *name);
+
+extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+			   char *filename);
+
+/* cf-sysctl.c */
+extern int __init cachefiles_sysctl_init(void);
+extern void __exit cachefiles_sysctl_cleanup(void);
+
+/* cf-xattr.c */
+extern int cachefiles_check_object_type(struct cachefiles_object *object);
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
+				       struct cachefiles_xattr *auxdata);
+extern int cachefiles_check_object_xattr(struct cachefiles_object *object,
+					 struct cachefiles_xattr *auxdata);
+extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+					  struct dentry *dentry);
+
+
+/*****************************************************************************/
+/*
+ * error handling
+ */
+#define kerror(FMT,...) printk(KERN_ERR "CacheFiles: "FMT"\n" ,##__VA_ARGS__);
+
+#define cachefiles_io_error(___cache, FMT, ...)		\
+do {							\
+	kerror("I/O Error: " FMT ,##__VA_ARGS__);	\
+	fscache_io_error(&(___cache)->cache);		\
+	set_bit(CACHEFILES_DEAD, &(___cache)->flags);	\
+} while(0)
+
+#define cachefiles_io_error_obj(object, FMT, ...)			\
+do {									\
+	struct cachefiles_cache *___cache;				\
+									\
+	___cache = container_of((object)->fscache.cache,		\
+				struct cachefiles_cache, cache);	\
+	cachefiles_io_error(___cache, FMT ,##__VA_ARGS__);		\
+} while(0)
+
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT,...) \
+	printk("[%-6.6s] "FMT"\n",current->comm ,##__VA_ARGS__)
+
+/* make sure we maintain the format strings, even when debugging is disabled */
+static inline void _dbprintk(const char *fmt, ...)
+	__attribute__((format(printf,1,2)));
+static inline void _dbprintk(const char *fmt, ...)
+{
+}
+
+#define kenter(FMT,...)	dbgprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define kleave(FMT,...)	dbgprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define kdebug(FMT,...)	dbgprintk(FMT ,##__VA_ARGS__)
+
+
+#if defined(__KDEBUG)
+#define _enter(FMT,...)	kenter(FMT,##__VA_ARGS__)
+#define _leave(FMT,...)	kleave(FMT,##__VA_ARGS__)
+#define _debug(FMT,...)	kdebug(FMT,##__VA_ARGS__)
+
+#elif defined(CONFIG_CACHEFILES_DEBUG)
+#define _enter(FMT,...)					\
+do {							\
+	if (cachefiles_debug & CACHEFILES_DEBUG_KENTER)	\
+		kenter(FMT,##__VA_ARGS__);		\
+} while (0)
+
+#define _leave(FMT,...)					\
+do {							\
+	if (cachefiles_debug & CACHEFILES_DEBUG_KLEAVE)	\
+		kleave(FMT,##__VA_ARGS__);		\
+} while (0)
+
+#define _debug(FMT,...)					\
+do {							\
+	if (cachefiles_debug & CACHEFILES_DEBUG_KDEBUG)	\
+		kdebug(FMT,##__VA_ARGS__);		\
+} while (0)
+
+#else
+#define _enter(FMT,...)	_dbprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define _leave(FMT,...)	_dbprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define _debug(FMT,...)	_dbprintk(FMT ,##__VA_ARGS__)
+#endif
+
+#if 1 // defined(__KDEBUGALL)
+
+#define ASSERT(X)							\
+do {									\
+	if (unlikely(!(X))) {						\
+		printk(KERN_ERR "\n");					\
+		printk(KERN_ERR "CacheFiles: Assertion failed\n");	\
+		BUG();							\
+	}								\
+} while(0)
+
+#define ASSERTCMP(X, OP, Y)						\
+do {									\
+	if (unlikely(!((X) OP (Y)))) {					\
+		printk(KERN_ERR "\n");					\
+		printk(KERN_ERR "CacheFiles: Assertion failed\n");	\
+		printk(KERN_ERR "%lx " #OP " %lx is false\n",		\
+		       (unsigned long)(X), (unsigned long)(Y));		\
+		BUG();							\
+	}								\
+} while(0)
+
+#define ASSERTIF(C, X)							\
+do {									\
+	if (unlikely((C) && !(X))) {					\
+		printk(KERN_ERR "\n");					\
+		printk(KERN_ERR "CacheFiles: Assertion failed\n");	\
+		BUG();							\
+	}								\
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y)					\
+do {									\
+	if (unlikely((C) && !((X) OP (Y)))) {				\
+		printk(KERN_ERR "\n");					\
+		printk(KERN_ERR "CacheFiles: Assertion failed\n");	\
+		printk(KERN_ERR "%lx " #OP " %lx is false\n",		\
+		       (unsigned long)(X), (unsigned long)(Y));		\
+		BUG();							\
+	}								\
+} while(0)
+
+#else
+
+#define ASSERT(X)				\
+do {						\
+} while(0)
+
+#define ASSERTCMP(X, OP, Y)			\
+do {						\
+} while(0)
+
+#define ASSERTIF(C, X)				\
+do {						\
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y)		\
+do {						\
+} while(0)
+
+#endif
diff --git a/fs/fcntl.c b/fs/fcntl.c
index d35cbc6..b43d821 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -529,6 +529,8 @@ int send_sigurg(struct fown_struct *fown
 	return ret;
 }
 
+EXPORT_SYMBOL(send_sigurg);
+
 static DEFINE_RWLOCK(fasync_lock);
 static kmem_cache_t *fasync_cache __read_mostly;
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 0131ba0..47a3fa8 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -234,6 +234,7 @@ struct file fastcall *fget_light(unsigne
 	return file;
 }
 
+EXPORT_SYMBOL_GPL(fget_light);
 
 void put_filp(struct file *file)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c189e7f..91248d3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1651,6 +1651,8 @@ extern ssize_t generic_file_direct_write
 		unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
 		unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_kernel_page(struct address_space *,
+						       pgoff_t, struct page *);
 extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
 ssize_t generic_file_write_nolock(struct file *file, const struct iovec *iov,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 82b2753..acc3481 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -181,6 +181,12 @@ static inline void wait_on_page_fs_misc(
 extern void fastcall end_page_fs_misc(struct page *page);
 
 /*
+ * permit installation of a state change monitor in the queue for a page
+ */
+extern void install_page_waitqueue_monitor(struct page *page,
+					   wait_queue_t *monitor);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index efc1b74..52d941f 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1349,6 +1349,8 @@ #endif
 	audit_copy_inode(&context->names[idx], inode);
 }
 
+EXPORT_SYMBOL_GPL(__audit_inode_child);
+
 /**
  * auditsc_get_stamp - get local copies of audit_context values
  * @ctx: audit_context for the task
diff --git a/mm/filemap.c b/mm/filemap.c
index 2e365d4..1eb60ea 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -520,6 +520,18 @@ void fastcall wait_on_page_bit(struct pa
 }
 EXPORT_SYMBOL(wait_on_page_bit);
 
+void install_page_waitqueue_monitor(struct page *page, wait_queue_t *monitor)
+{
+	wait_queue_head_t *q = page_waitqueue(page);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->lock, flags);
+	__add_wait_queue(q, monitor);
+	spin_unlock_irqrestore(&q->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(install_page_waitqueue_monitor);
+
 /**
  * unlock_page - unlock a locked page
  * @page: the page
@@ -2235,6 +2247,92 @@ zero_length_segment:
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
+/*
+ * This writes the data from the source page to the specified page offset in
+ * the nominated file
+ * - the source page does not need to have any association with the file or the
+ *   page offset
+ */
+int
+generic_file_buffered_write_one_kernel_page(struct address_space *mapping,
+					    pgoff_t index,
+					    struct page *src)
+{
+	const struct address_space_operations *a_ops = mapping->a_ops;
+	struct pagevec	lru_pvec;
+	struct page *page, *cached_page = NULL;
+	long status = 0;
+
+	pagevec_init(&lru_pvec, 0);
+
+#if 0
+	if (mapping->tree_lock.magic != RWLOCK_MAGIC)
+		printk("RWLOCK magic incorrect: %x != %x\n",
+		       mapping->tree_lock.magic, RWLOCK_MAGIC);
+#endif
+
+	page = __grab_cache_page(mapping, index, &cached_page, &lru_pvec);
+	if (!page) {
+		BUG_ON(cached_page);
+		return -ENOMEM;
+	}
+
+	status = a_ops->prepare_write(NULL, page, 0, PAGE_CACHE_SIZE);
+	if (unlikely(status)) {
+		loff_t isize = i_size_read(mapping->host);
+
+		if (status != AOP_TRUNCATED_PAGE)
+			unlock_page(page);
+		page_cache_release(page);
+		if (status == AOP_TRUNCATED_PAGE)
+			goto sync;
+
+		/* prepare_write() may have instantiated a few blocks outside
+		 * i_size.  Trim these off again.
+		 */
+		if ((1ULL << (index + 1)) > isize)
+			vmtruncate(mapping->host, isize);
+		goto sync;
+	}
+
+	copy_highpage(page, src);
+	flush_dcache_page(page);
+
+	status = a_ops->commit_write(NULL, page, 0, PAGE_CACHE_SIZE);
+	if (status == AOP_TRUNCATED_PAGE) {
+		page_cache_release(page);
+		goto sync;
+	}
+
+	if (status > 0)
+		status = 0;
+
+	unlock_page(page);
+	mark_page_accessed(page);
+	page_cache_release(page);
+	if (status < 0)
+		return status;
+
+	balance_dirty_pages_ratelimited(mapping);
+	cond_resched();
+
+sync:
+	if (cached_page)
+		page_cache_release(cached_page);
+
+	/* the caller must handle O_SYNC themselves, but we handle S_SYNC and
+	 * MS_SYNCHRONOUS here */
+	if (unlikely(IS_SYNC(mapping->host)) && !a_ops->writepage)
+		status = generic_osync_inode(mapping->host, mapping,
+					     OSYNC_METADATA | OSYNC_DATA);
+
+	/* the caller must handle O_DIRECT for themselves */
+
+	pagevec_lru_add(&lru_pvec);
+	return status;
+}
+EXPORT_SYMBOL(generic_file_buffered_write_one_kernel_page);
+
 static ssize_t
 __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
 				unsigned long nr_segs, loff_t *ppos)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/7] NFS: Use local caching [try #12]
  2006-08-18 15:35 ` [PATCH 5/7] NFS: Use local caching " David Howells
@ 2006-08-18 15:43   ` Chuck Lever
  2006-08-18 18:41     ` Jeff Garzik
  2006-08-18 15:51   ` David Howells
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2006-08-18 15:43 UTC (permalink / raw)
  To: David Howells
  Cc: akpm, linux-kernel, nfsv4, trond.myklebust, torvalds,
	linux-cachefs, linux-fsdevel

Hi David-

On 8/18/06, David Howells <dhowells@redhat.com> wrote:
> The attached patch makes it possible for the NFS filesystem to make use of the
> network filesystem local caching service (FS-Cache).
>
> To be able to use this, an updated mount program is required.  This can be
> obtained from:
>
>         http://people.redhat.com/steved/cachefs/util-linux/
>
> To mount an NFS filesystem to use caching, add an "fsc" option to the mount:
>
>         mount warthog:/ /a -o fsc
>
> Signed-Off-By: David Howells <dhowells@redhat.com>
> ---
>
>  fs/Kconfig                 |    7 +
>  fs/nfs/Makefile            |    1
>  fs/nfs/client.c            |   11 +
>  fs/nfs/file.c              |   49 ++++-
>  fs/nfs/fscache.c           |  348 ++++++++++++++++++++++++++++++++
>  fs/nfs/fscache.h           |  476 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/inode.c             |   21 ++
>  fs/nfs/internal.h          |   32 +++
>  fs/nfs/pagelist.c          |    3
>  fs/nfs/read.c              |   30 +++
>  fs/nfs/super.c             |    1
>  fs/nfs/sysctl.c            |   43 ++++
>  fs/nfs/write.c             |   11 +
>  include/linux/nfs4_mount.h |    1
>  include/linux/nfs_fs.h     |    5
>  include/linux/nfs_fs_sb.h  |    5
>  include/linux/nfs_mount.h  |    1
>  17 files changed, 1035 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> new file mode 100644
> index 0000000..94d5e3a
> --- /dev/null
> +++ b/fs/nfs/fscache.c
> @@ -0,0 +1,348 @@
> +/* fscache.c: NFS filesystem cache interface
> + *
> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + *

> +
> +static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
> +                                  void *buffer, uint16_t bufmax)
> +{

Why don't you use the function declaration style that is used in the
rest of the NFS client?  All the parameters belong on one line, don't
they?

-- 
"We who cut mere stones must always be envisioning cathedrals"
   -- Quarry worker's creed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/7] NFS: Use local caching [try #12]
  2006-08-18 15:35 ` [PATCH 5/7] NFS: Use local caching " David Howells
  2006-08-18 15:43   ` Chuck Lever
@ 2006-08-18 15:51   ` David Howells
  1 sibling, 0 replies; 11+ messages in thread
From: David Howells @ 2006-08-18 15:51 UTC (permalink / raw)
  To: Chuck Lever
  Cc: David Howells, torvalds, akpm, steved, trond.myklebust,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Chuck Lever <chucklever@gmail.com> wrote:

> > +static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
> > +                                  void *buffer, uint16_t bufmax)
> > +{
> 
> Why don't you use the function declaration style that is used in the
> rest of the NFS client?

Actually, the NFS client has several different styles, and mine's not without
precedent, eg:

extern int nfs3_setxattr(struct dentry *, const char *,
			const void *, size_t, int);

extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t,
			unsigned long);

static inline loff_t
nfs_size_to_loff_t(__u64 size)

> All the parameters belong on one line, don't they?

Depends how much you want to upset those people who tremble with anxiety at
the sight of a line longer than 80 chars.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/7] NFS: Use local caching [try #12]
  2006-08-18 15:43   ` Chuck Lever
@ 2006-08-18 18:41     ` Jeff Garzik
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2006-08-18 18:41 UTC (permalink / raw)
  To: Chuck Lever
  Cc: David Howells, torvalds, akpm, steved, trond.myklebust,
	linux-fsdevel, linux-cachefs, nfsv4, linux-kernel

Chuck Lever wrote:
> Hi David-
> 
> On 8/18/06, David Howells <dhowells@redhat.com> wrote:
>> The attached patch makes it possible for the NFS filesystem to make 
>> use of the
>> network filesystem local caching service (FS-Cache).
>>
>> To be able to use this, an updated mount program is required.  This 
>> can be
>> obtained from:
>>
>>         http://people.redhat.com/steved/cachefs/util-linux/
>>
>> To mount an NFS filesystem to use caching, add an "fsc" option to the 
>> mount:
>>
>>         mount warthog:/ /a -o fsc
>>
>> Signed-Off-By: David Howells <dhowells@redhat.com>
>> ---
>>
>>  fs/Kconfig                 |    7 +
>>  fs/nfs/Makefile            |    1
>>  fs/nfs/client.c            |   11 +
>>  fs/nfs/file.c              |   49 ++++-
>>  fs/nfs/fscache.c           |  348 ++++++++++++++++++++++++++++++++
>>  fs/nfs/fscache.h           |  476 
>> ++++++++++++++++++++++++++++++++++++++++++++
>>  fs/nfs/inode.c             |   21 ++
>>  fs/nfs/internal.h          |   32 +++
>>  fs/nfs/pagelist.c          |    3
>>  fs/nfs/read.c              |   30 +++
>>  fs/nfs/super.c             |    1
>>  fs/nfs/sysctl.c            |   43 ++++
>>  fs/nfs/write.c             |   11 +
>>  include/linux/nfs4_mount.h |    1
>>  include/linux/nfs_fs.h     |    5
>>  include/linux/nfs_fs_sb.h  |    5
>>  include/linux/nfs_mount.h  |    1
>>  17 files changed, 1035 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
>> new file mode 100644
>> index 0000000..94d5e3a
>> --- /dev/null
>> +++ b/fs/nfs/fscache.c
>> @@ -0,0 +1,348 @@
>> +/* fscache.c: NFS filesystem cache interface
>> + *
>> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
>> + * Written by David Howells (dhowells@redhat.com)
>> + *
> 
>> +
>> +static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
>> +                                  void *buffer, uint16_t bufmax)
>> +{
> 
> Why don't you use the function declaration style that is used in the
> rest of the NFS client?  All the parameters belong on one line, don't
> they?

Normally, one wraps the line if it exceeds 80 columns...

	Jeff




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-08-18 18:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-18 15:35 [PATCH 0/7] Permit filesystem local caching and NFS superblock sharing [try #12] David Howells
2006-08-18 15:35 ` [PATCH 1/7] FS-Cache: Provide a filesystem-specific sync'able page bit " David Howells
2006-08-18 15:35 ` [PATCH 2/7] FS-Cache: Generic filesystem caching facility " David Howells
2006-08-18 15:35 ` [PATCH 3/7] FS-Cache: Release page->private after failed readahead " David Howells
2006-08-18 15:35 ` [PATCH 4/7] FS-Cache: Make kAFS use FS-Cache " David Howells
2006-08-18 15:35 ` [PATCH 5/7] NFS: Use local caching " David Howells
2006-08-18 15:43   ` Chuck Lever
2006-08-18 18:41     ` Jeff Garzik
2006-08-18 15:51   ` David Howells
2006-08-18 15:35 ` [PATCH 6/7] FS-Cache: CacheFiles: ia64: missing copy_page export " David Howells
2006-08-18 15:35 ` [PATCH 7/7] FS-Cache: CacheFiles: A cache that backs onto a mounted filesystem " David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).