linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS Patch for FSCache
@ 2005-05-09 10:31 Steve Dickson
  2005-05-09 21:19 ` Andrew Morton
  2005-06-13 12:52 ` Steve Dickson
  0 siblings, 2 replies; 10+ messages in thread
From: Steve Dickson @ 2005-05-09 10:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-cachefs

[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]

Hello,

Attached is a patch that enables NFS to use David Howells'
File System Caching implementation (FSCache). Also attached
are two supplemental patches that are need to fixed two oops that
were found during debugging (Note: these patches are also
in people.redhat.com/steved/cachefs/2.6.12-rc3-mm3/)

2.6.12-rc3-mm3-nfs-fscache.patch - David and I have
been working on this for sometime now and the code
seems to be pretty solid. One issue is Trond's dislike
of how the NFS code is dependent on FSCache calls. I did
looking to changing this, but its not clear (at least to
me) how we could make things better... But that's
something that will need to be addressed.

The second issue is what we've been calling "NFS aliasing".
The fact that two mounted NFS super blocks can point to
the same page causes major fits for FSC. David has
proposed some patches to resolve this issue that are still
under review. But at this point, to stop a BUG() popping,
when a second NFS filesystem is mounted, the
2.6.12-rc3-mm3-fscache-cookie-exist.patch is needed.

The final patch 2.6.12-rc3-mm3-cachefs-wb.patch is need
to stop another BUG() from popping during NFS reads.

NFS uses FSC on a per-mount bases which means a new
mount flag 'fsc' is need to activate the caching.
Example:
     mount -t nfs4 -o fsc server:/home /mnt/server/home

(Note: people.redhat.com/steved/cachefs/util-linux/ has the
util-linux binary and source rpms with the fsc support).

To set up a mounted cachefs partition, first initialize
the disk partition by:
    echo "cachefs___" >/dev/hdg9
then mount the partition:
     mount -t cachefs /dev/hdg9 /cache-hdg9

See Documentation/filesystems/caching in the kernel
source for more details.

I'm hopeful that you'll added these patches to your tree
so they will get some much needed testing. I'm also going
to be pushing to get the caching code into a Fedora Core
kernel, but due to the  dependency on David's new vm_ops,
page_mkwrite, this might take some time...
(Note: people.redhat.com/steved/cachefs/mmpatches has all of the
current mm patches)

Comments?

steved.


[-- Attachment #2: 2.6.12-rc3-mm3-nfs-fscache.patch --]
[-- Type: text/x-patch, Size: 26573 bytes --]

This patch enables NFS to use file system caching (i.e. FSCache).
To turn this feature on you must specifiy the -o fsc mount flag
as well as have a cachefs partition mounted. 

Signed-off-by: Steve Dickson <steved@redhat.com>

--- 2.6.12-rc2-mm3/fs/nfs/file.c.orig	2005-04-23 10:13:24.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/nfs/file.c	2005-04-23 11:25:47.000000000 -0400
@@ -27,9 +27,11 @@
 #include <linux/slab.h>
 #include <linux/pagemap.h>
 #include <linux/smp_lock.h>
+#include <linux/buffer_head.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
+#include "nfs-fscache.h"
 
 #include "delegation.h"
 
@@ -194,6 +196,12 @@ nfs_file_sendfile(struct file *filp, lof
 	return res;
 }
 
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+
 static int
 nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
 {
@@ -207,6 +215,10 @@ nfs_file_mmap(struct file * file, struct
 	status = nfs_revalidate_inode(NFS_SERVER(inode), inode);
 	if (!status)
 		status = generic_file_mmap(file, vma);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		vma->vm_ops->page_mkwrite = nfs_file_page_mkwrite;
+
 	return status;
 }
 
@@ -258,6 +270,11 @@ static int nfs_commit_write(struct file 
 	return status;
 }
 
+/*
+ * since we use page->private for our own nefarious purposes when using fscache, we have to
+ * override extra address space ops to prevent fs/buffer.c from getting confused, even though we
+ * may not have asked its opinion
+ */
 struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -269,6 +286,11 @@ struct address_space_operations nfs_file
 #ifdef CONFIG_NFS_DIRECTIO
 	.direct_IO = nfs_direct_IO,
 #endif
+#ifdef CONFIG_NFS_FSCACHE
+	.sync_page	= block_sync_page,
+	.releasepage	= nfs_releasepage,
+	.invalidatepage	= nfs_invalidatepage,
+#endif
 };
 
 /* 
--- 2.6.12-rc2-mm3/fs/nfs/inode.c.orig	2005-04-23 10:13:24.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/nfs/inode.c	2005-04-23 17:51:57.000000000 -0400
@@ -42,6 +42,8 @@
 #include "nfs4_fs.h"
 #include "delegation.h"
 
+#include "nfs-fscache.h"
+
 #define NFSDBG_FACILITY		NFSDBG_VFS
 #define NFS_PARANOIA 1
 
@@ -169,6 +171,10 @@ nfs_clear_inode(struct inode *inode)
 	cred = nfsi->cache_access.cred;
 	if (cred)
 		put_rpccred(cred);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_clear_fscookie(nfsi);
+
 	BUG_ON(atomic_read(&nfsi->data_updates) != 0);
 }
 
@@ -503,6 +509,9 @@ nfs_fill_super(struct super_block *sb, s
 			server->namelen = NFS2_MAXNAMLEN;
 	}
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_fill_fscookie(sb);
+
 	sb->s_op = &nfs_sops;
 	return nfs_sb_init(sb, authflavor);
 }
@@ -579,6 +588,7 @@ static int nfs_show_options(struct seq_f
 		{ NFS_MOUNT_NOAC, ",noac", "" },
 		{ NFS_MOUNT_NONLM, ",nolock", ",lock" },
 		{ NFS_MOUNT_NOACL, ",noacl", "" },
+		{ NFS_MOUNT_FSCACHE, ",fscache", "" },
 		{ 0, NULL, NULL }
 	};
 	struct proc_nfs_info *nfs_infop;
@@ -623,6 +633,9 @@ nfs_zap_caches(struct inode *inode)
 		nfsi->flags |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
 	else
 		nfsi->flags |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_zap_fscookie(nfsi);
 }
 
 static void nfs_zap_acl_cache(struct inode *inode)
@@ -770,6 +783,9 @@ nfs_fhget(struct super_block *sb, struct
 		memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
 		nfsi->cache_access.cred = NULL;
 
+		if (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)
+			nfs_fhget_fscookie(sb, nfsi);
+
 		unlock_new_inode(inode);
 	} else
 		nfs_refresh_inode(inode, fattr);
@@ -1076,6 +1092,9 @@ __nfs_revalidate_inode(struct nfs_server
 				(long long)NFS_FILEID(inode));
 		/* This ensures we revalidate dentries */
 		nfsi->cache_change_attribute++;
+
+		if (server->flags & NFS_MOUNT_FSCACHE)
+			nfs_renew_fscookie(server, nfsi);
 	}
 	if (flags & NFS_INO_INVALID_ACL)
 		nfs_zap_acl_cache(inode);
@@ -1515,6 +1534,14 @@ static struct super_block *nfs_get_sb(st
 		goto out_err;
 	}
 
+#ifndef CONFIG_NFS_FSCACHE
+	if (data->flags & NFS_MOUNT_FSCACHE) {
+		printk(KERN_WARNING "NFS: kernel not compiled with CONFIG_NFS_FSCACHE\n");
+		kfree(server);
+		return ERR_PTR(-EINVAL);
+	}
+#endif
+
 	s = sget(fs_type, nfs_compare_super, nfs_set_super, server);
 	if (IS_ERR(s) || s->s_root)
 		goto out_rpciod_down;
@@ -1542,6 +1569,9 @@ static void nfs_kill_super(struct super_
 
 	kill_anon_super(s);
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_kill_fscookie(server);
+
 	if (server->client != NULL && !IS_ERR(server->client))
 		rpc_shutdown_client(server->client);
 	if (server->client_sys != NULL && !IS_ERR(server->client_sys))
@@ -1760,6 +1790,9 @@ static int nfs4_fill_super(struct super_
 
 	sb->s_time_gran = 1;
 
+	if (server->flags & NFS4_MOUNT_FSCACHE)
+		nfs4_fill_fscookie(sb);
+
 	sb->s_op = &nfs4_sops;
 	err = nfs_sb_init(sb, authflavour);
 	if (err == 0)
@@ -1903,6 +1936,9 @@ static void nfs4_kill_super(struct super
 	nfs_return_all_delegations(sb);
 	kill_anon_super(sb);
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_kill_fscookie(server);
+
 	nfs4_renewd_prepare_shutdown(server);
 
 	if (server->client != NULL && !IS_ERR(server->client))
@@ -2021,6 +2057,11 @@ static int __init init_nfs_fs(void)
 {
 	int err;
 
+	/* we want to be able to cache */
+	err = nfs_register_netfs();
+	if (err < 0)
+		goto out5;
+
 	err = nfs_init_nfspagecache();
 	if (err)
 		goto out4;
@@ -2068,6 +2109,9 @@ out2:
 out3:
 	nfs_destroy_nfspagecache();
 out4:
+	nfs_unregister_netfs();
+out5:
+
 	return err;
 }
 
@@ -2080,6 +2124,7 @@ static void __exit exit_nfs_fs(void)
 	nfs_destroy_readpagecache();
 	nfs_destroy_inodecache();
 	nfs_destroy_nfspagecache();
+	nfs_unregister_netfs();
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
--- 2.6.12-rc2-mm3/fs/nfs/Makefile.orig	2005-04-23 10:13:24.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/nfs/Makefile	2005-04-23 11:25:47.000000000 -0400
@@ -13,4 +13,5 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4x
 			   delegation.o idmap.o \
 			   callback.o callback_xdr.o callback_proc.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
+nfs-$(CONFIG_NFS_FSCACHE) += nfs-fscache.o
 nfs-objs		:= $(nfs-y)
--- /dev/null	2005-03-28 14:47:10.233040208 -0500
+++ 2.6.12-rc2-mm3/fs/nfs/nfs-fscache.c	2005-04-23 15:14:02.000000000 -0400
@@ -0,0 +1,191 @@
+/* nfs-fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/config.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+
+#include "nfs-fscache.h"
+
+#define NFS_CACHE_FH_INDEX_SIZE sizeof(struct nfs_fh)
+
+/*
+ * the root index is
+ */
+static struct fscache_page *nfs_cache_get_page_token(struct page *page);
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+	.get_page_token	= nfs_cache_get_page_token,
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+	.name			= "nfs",
+	.version		= 0,
+	.ops			= &nfs_cache_ops,
+};
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+static fscache_match_val_t nfs_cache_server_match(void *target,
+						  const void *entry);
+static void nfs_cache_server_update(void *source, void *entry);
+
+struct fscache_index_def nfs_cache_server_index_def = {
+	.name			= "servers",
+	.data_size		= 18,
+	.keys[0]		= { FSCACHE_INDEX_KEYS_IPV6ADDR, 16 },
+	.keys[1]		= { FSCACHE_INDEX_KEYS_BIN, 2 },
+	.match			= nfs_cache_server_match,
+	.update			= nfs_cache_server_update,
+};
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+static fscache_match_val_t nfs_cache_fh_match(void *target, const void *entry);
+static void nfs_cache_fh_update(void *source, void *entry);
+
+struct fscache_index_def nfs_cache_fh_index_def = {
+	.name			= "fh",
+	.data_size		= NFS_CACHE_FH_INDEX_SIZE,
+	.keys[0]		= { FSCACHE_INDEX_KEYS_BIN_SZ2,
+				    sizeof(struct nfs_fh) },
+	.match			= nfs_cache_fh_match,
+	.update			= nfs_cache_fh_update,
+};
+
+/*
+ * get a page token for the specified page
+ * - the token will be attached to page->private and PG_private will be set on
+ *   the page
+ */
+static struct fscache_page *nfs_cache_get_page_token(struct page *page)
+{
+	return fscache_page_get_private(page, GFP_NOIO);
+}
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+	[0 ... 9]	= 0x00,
+	[10 ... 11]	= 0xff
+};
+
+/*
+ * match a server record obtained from the cache
+ */
+static fscache_match_val_t nfs_cache_server_match(void *target,
+						  const void *entry)
+{
+	struct nfs_server *server = target;
+	const uint8_t *data = entry;
+
+	switch (server->addr.sin_family) {
+	case AF_INET:
+		if (memcmp(data + 0,
+			   &nfs_cache_ipv6_wrapper_for_ipv4,
+			   12) != 0)
+			break;
+
+		if (memcmp(data + 12, &server->addr.sin_addr, 4) != 0)
+			break;
+
+		if (memcmp(data + 16, &server->addr.sin_port, 2) != 0)
+			break;
+
+		return FSCACHE_MATCH_SUCCESS;
+
+	case AF_INET6:
+		if (memcmp(data + 0, &server->addr.sin_addr, 16) != 0)
+			break;
+
+		if (memcmp(data + 16, &server->addr.sin_port, 2) != 0)
+			break;
+
+		return FSCACHE_MATCH_SUCCESS;
+
+	default:
+		break;
+	}
+
+	return FSCACHE_MATCH_FAILED;
+}
+
+/*
+ * update a server record in the cache
+ */
+static void nfs_cache_server_update(void *source, void *entry)
+{
+	struct nfs_server *server = source;
+	uint8_t *data = entry;
+
+	switch (server->addr.sin_family) {
+	case AF_INET:
+		memcpy(data + 0, &nfs_cache_ipv6_wrapper_for_ipv4, 12);
+		memcpy(data + 12, &server->addr.sin_addr, 4);
+		memcpy(data + 16, &server->addr.sin_port, 2);
+		return;
+
+	case AF_INET6:
+		memcpy(data + 0, &server->addr.sin_addr, 16);
+		memcpy(data + 16, &server->addr.sin_port, 2);
+		return;
+
+	default:
+		return;
+	}
+}
+
+/*
+ * match a file handle record obtained from the cache
+ */
+static fscache_match_val_t nfs_cache_fh_match(void *target, const void *entry)
+{
+	struct nfs_inode *nfsi = target;
+	const uint8_t *data = entry;
+	uint16_t nsize;
+
+	/* check the file handle matches */
+	memcpy(&nsize, data, 2);
+	nsize = ntohs(nsize);
+
+	if (nsize <= NFS_CACHE_FH_INDEX_SIZE && nfsi->fh.size == nsize) {
+		if (memcmp(data + 2, nfsi->fh.data, nsize) == 0) {
+			return FSCACHE_MATCH_SUCCESS;
+		}
+	}
+
+	return FSCACHE_MATCH_FAILED;
+}
+
+/*
+ * update a fh record in the cache
+ */
+static void nfs_cache_fh_update(void *source, void *entry)
+{
+	struct nfs_inode *nfsi = source;
+	uint16_t nsize;
+	uint8_t *data = entry;
+
+	BUG_ON(nfsi->fh.size > NFS_CACHE_FH_INDEX_SIZE - 2);
+
+	/* set the file handle */
+	nsize = htons(nfsi->fh.size);
+	memcpy(data, &nsize, 2);
+	memcpy(data + 2, &nfsi->fh.data, nfsi->fh.size);
+	memset(data + 2 + nfsi->fh.size,
+	       FSCACHE_INDEX_DEADFILL_PATTERN,
+	       NFS_CACHE_FH_INDEX_SIZE - 2 - nfsi->fh.size);
+}
--- /dev/null	2005-03-28 14:47:10.233040208 -0500
+++ 2.6.12-rc2-mm3/fs/nfs/nfs-fscache.h	2005-04-23 17:51:15.000000000 -0400
@@ -0,0 +1,158 @@
+/* nfs-fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+#include <linux/fscache.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#ifndef CONFIG_FSCACHE
+#error "CONFIG_NFS_FSCACHE is defined but not CONFIG_FSCACHE"
+#endif
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_index_def nfs_cache_server_index_def;
+extern struct fscache_index_def nfs_cache_fh_index_def;
+
+extern int nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, int);
+extern int nfs_mkwrite(struct page *);
+
+static inline void 
+nfs_renew_fscookie(struct nfs_server *server, struct nfs_inode *nfsi)
+{
+	struct fscache_cookie *old =  nfsi->fscache;
+
+	/* retire the current fscache cache and get a new one */
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = fscache_acquire_cookie(server->fscache, NULL, nfsi);
+
+	dfprintk(FSCACHE,
+		"NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+		server, nfsi, old, nfsi->fscache);
+
+	return;
+}
+static inline void
+nfs4_fill_fscookie(struct super_block *sb)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	/* create a cache index for looking up filehandles */
+	server->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+			       &nfs_cache_fh_index_def, server);
+	if (server->fscache == NULL) {
+		printk(KERN_WARNING "NFS4: No Fscache cookie. Turning Fscache off!\n");
+	} else /* reuse the NFS mount option */
+		server->flags |= NFS_MOUNT_FSCACHE;
+
+	dfprintk(FSCACHE,"NFS: nfs4 cookie (0x%p,0x%p/0x%p)\n", 
+		sb, server, server->fscache);
+
+	return;
+}
+static inline void
+nfs_fill_fscookie(struct super_block *sb)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	/* create a cache index for looking up filehandles */
+	server->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+			      &nfs_cache_fh_index_def, server);
+	if (server->fscache == NULL) {
+		server->flags &= ~NFS_MOUNT_FSCACHE;
+		printk(KERN_WARNING "NFS: No Fscache cookie. Turning Fscache off!\n");
+	}
+	dfprintk(FSCACHE,"NFS: cookie (0x%p/0x%p/0x%p)\n", 
+		sb, server, server->fscache);
+
+	return;
+}
+static inline void
+nfs_fhget_fscookie(struct super_block *sb, struct nfs_inode *nfsi)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	nfsi->fscache = fscache_acquire_cookie(server->fscache, NULL, nfsi);
+	if (server->fscache == NULL)
+		printk(KERN_WARNING "NFS: NULL FScache cookie: sb 0x%p nfsi 0x%p\n", sb, nfsi);
+
+	dfprintk(FSCACHE, "NFS: fhget new cookie (0x%p/0x%p/0x%p)\n",
+		sb, nfsi, nfsi->fscache);
+
+	return;
+}
+static inline void 
+nfs_kill_fscookie(struct nfs_server *server)
+{
+	dfprintk(FSCACHE,"NFS: killing cookie (0x%p/0x%p)\n",
+		server, server->fscache);
+
+	fscache_relinquish_cookie(server->fscache, 0);
+	server->fscache = NULL;
+
+	return;
+}
+static inline void
+nfs_clear_fscookie(struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+			nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 0);
+	nfsi->fscache = NULL;
+
+	return;
+}
+static inline void
+nfs_zap_fscookie(struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+		nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = NULL;
+
+	return;
+}
+static inline int
+nfs_register_netfs(void)
+{
+	int err;
+
+	err = fscache_register_netfs(&nfs_cache_netfs, &nfs_cache_server_index_def);
+
+	return err;
+}
+static inline void
+nfs_unregister_netfs(void)
+{
+	fscache_unregister_netfs(&nfs_cache_netfs);
+
+	return;
+}
+#else
+static inline void nfs_fill_fscookie(struct super_block *sb) {}
+static inline void nfs_fhget_fscookie(struct super_block *sb, struct nfs_inode *nfsi) {}
+static inline void nfs4_fill_fscookie(struct super_block *sb) {}
+static inline void nfs_kill_fscookie(struct nfs_server *server) {}
+static inline void nfs_clear_fscookie(struct nfs_inode *nfsi) {}
+static inline void nfs_zap_fscookie(struct nfs_inode *nfsi) {}
+static inline void 
+	nfs_renew_fscookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline int nfs_register_netfs() { return 0; }
+static inline void nfs_unregister_netfs() {}
+
+#endif
+#endif /* _NFS_FSCACHE_H */
--- 2.6.12-rc2-mm3/fs/nfs/read.c.orig	2005-04-23 10:13:25.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/nfs/read.c	2005-04-23 11:25:47.000000000 -0400
@@ -27,6 +27,7 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
 #include <linux/smp_lock.h>
 
 #include <asm/system.h>
@@ -73,6 +74,47 @@ int nfs_return_empty_page(struct page *p
 	return 0;
 }
 
+#ifdef CONFIG_NFS_FSCACHE
+/*
+ * store a newly fetched page in fscache
+ */
+static void
+nfs_readpage_to_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	dprintk("NFS:     readpage_to_fscache_complete (%p/%p/%p/%d)\n",
+		cookie_data, page, data, error);
+
+	end_page_fs_misc(page);
+}
+
+static inline void
+nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
+{
+	int ret;
+
+	dprintk("NFS: readpage_to_fscache(0x%p/0x%p/0x%p/%d)\n",
+		NFS_I(inode)->fscache, page, inode, sync);
+
+	SetPageFsMisc(page);
+	ret = fscache_write_page(NFS_I(inode)->fscache, page,
+		nfs_readpage_to_fscache_complete, NULL, GFP_KERNEL);
+	if (ret != 0) {
+		dprintk("NFS:     readpage_to_fscache: error %d\n", ret);
+		fscache_uncache_page(NFS_I(inode)->fscache, page);
+		ClearPageFsMisc(page);
+	}
+
+	unlock_page(page);
+}
+#else
+static inline void
+nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
+{
+	BUG();
+}
+#endif
+
+
 /*
  * Read a page synchronously.
  */
@@ -149,6 +191,13 @@ static int nfs_readpage_sync(struct nfs_
 		ClearPageError(page);
 	result = 0;
 
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_readpage_to_fscache(inode, page, 1);
+	else
+		unlock_page(page);
+
+	return result;
+
 io_error:
 	unlock_page(page);
 	nfs_readdata_free(rdata);
@@ -180,7 +229,13 @@ static int nfs_readpage_async(struct nfs
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
-	unlock_page(req->wb_page);
+	struct inode *d_inode = req->wb_context->dentry->d_inode;
+
+	if ((NFS_SERVER(d_inode)->flags & NFS_MOUNT_FSCACHE) && 
+			PageUptodate(req->wb_page))
+		nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+	else
+		unlock_page(req->wb_page);
 
 	nfs_clear_request(req);
 	nfs_release_request(req);
@@ -477,6 +532,67 @@ void nfs_readpage_result(struct rpc_task
 	data->complete(data, status);
 }
 
+
+/*
+ * Read a page through the on-disc cache if possible
+ */
+#ifdef CONFIG_NFS_FSCACHE
+static void
+nfs_readpage_from_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	dprintk("NFS: readpage_from_fscache_complete (0x%p/0x%p/0x%p/%d)\n",
+		cookie_data, page, data, error);
+
+	if (error)
+		SetPageError(page);
+	else
+		SetPageUptodate(page);
+
+	unlock_page(page);
+}
+
+static inline int
+nfs_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+	struct fscache_page *pageio;
+	int ret;
+
+	dprintk("NFS: readpage_from_fscache(0x%p/0x%p/0x%p)\n",
+		NFS_I(inode)->fscache, page, inode);
+
+	pageio = fscache_page_get_private(page, GFP_NOIO);
+	if (IS_ERR(pageio)) {
+		 dprintk("NFS:     fscache_page_get_private error %ld\n", PTR_ERR(pageio));
+		return PTR_ERR(pageio);
+	}
+
+	ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+					 page,
+					 nfs_readpage_from_fscache_complete,
+					 NULL,
+					 GFP_KERNEL);
+
+	switch (ret) {
+	case 1: /* read BIO submitted and wb-journal entry found */
+		BUG();
+		
+	case 0: /* read BIO submitted (page in fscache) */
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dprintk("NFS:     fscache_read_or_alloc_page error %d\n", ret);
+		return 1;
+
+	default:
+		return ret;
+	}
+}
+#else
+static inline int 
+nfs_readpage_from_fscache(struct inode *inode, struct page *page) { return 1; }
+#endif
+
 /*
  * Read a page over NFS.
  * We read the page synchronously in the following case:
@@ -510,6 +626,13 @@ int nfs_readpage(struct file *file, stru
 		ctx = get_nfs_open_context((struct nfs_open_context *)
 				file->private_data);
 	if (!IS_SYNC(inode)) {
+		if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE) {
+			error = nfs_readpage_from_fscache(inode, page);
+			if (error < 0)
+				goto out_error;
+			if (error == 0)
+				return error;
+		}
 		error = nfs_readpage_async(ctx, inode, page);
 		goto out;
 	}
@@ -540,6 +663,15 @@ readpage_async_filler(void *data, struct
 	unsigned int len;
 
 	nfs_wb_page(inode, page);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE) {
+		int error = nfs_readpage_from_fscache(inode, page);
+		if (error < 0)
+			return error;
+		if (error == 0)
+			return error;
+	}
+
 	len = nfs_page_length(inode, page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
@@ -613,3 +745,61 @@ void nfs_destroy_readpagecache(void)
 	if (kmem_cache_destroy(nfs_rdata_cachep))
 		printk(KERN_INFO "nfs_read_data: not all structures were freed\n");
 }
+
+#ifdef CONFIG_NFS_FSCACHE
+int nfs_invalidatepage(struct page *page, unsigned long offset)
+{
+	int ret = 1;
+	struct nfs_server *server = NFS_SERVER(page->mapping->host);
+
+	BUG_ON(!PageLocked(page));
+
+	if (server->flags & NFS_MOUNT_FSCACHE) {
+		if (PagePrivate(page)) {
+			struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+			dfprintk(PAGECACHE,"NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+				nfsi->fscache, page, nfsi);
+
+			fscache_uncache_page(nfsi->fscache, page);
+
+			if (offset == 0) {
+				BUG_ON(!PageLocked(page));
+				ret = 0;
+				if (!PageWriteback(page))
+					ret = page->mapping->a_ops->releasepage(page, 0);
+			}
+		}
+	} else
+		ret = 0;
+
+	return ret;
+}
+int nfs_releasepage(struct page *page, int gfp_flags)
+{
+	struct fscache_page *pageio;
+	struct nfs_server *server = NFS_SERVER(page->mapping->host);
+
+	if (server->flags & NFS_MOUNT_FSCACHE && PagePrivate(page)) {
+		struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+		dfprintk(PAGECACHE,"NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+			nfsi->fscache, page, nfsi);
+
+		fscache_uncache_page(nfsi->fscache, page);
+		pageio = (struct fscache_page *) page->private;
+		page->private = 0;
+		ClearPagePrivate(page);
+
+		if (pageio)
+			kfree(pageio);
+	}
+
+	return 0;
+}
+int nfs_mkwrite(struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+#endif
--- 2.6.12-rc2-mm3/fs/nfs/write.c.orig	2005-04-23 10:13:25.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/nfs/write.c	2005-04-23 18:07:11.000000000 -0400
@@ -255,6 +255,38 @@ static int wb_priority(struct writeback_
 }
 
 /*
+ * store an updated page in fscache
+ */
+#ifdef CONFIG_NFS_FSCACHE
+static void
+nfs_writepage_to_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	/* really need to synchronise the end of writeback, probably using a page flag */
+}
+static inline void
+nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	int ret; 
+
+	dprintk("NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
+		NFS_I(inode)->fscache, page, inode);
+
+	ret =  fscache_write_page(NFS_I(inode)->fscache, page,
+		nfs_writepage_to_fscache_complete, NULL, GFP_KERNEL);
+	if (ret != 0) {
+		dprintk("NFS:    fscache_write_page error %d\n", ret);
+		fscache_uncache_page(NFS_I(inode)->fscache, page);
+	}
+}
+#else
+static inline void
+nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	BUG();
+}
+#endif
+
+/*
  * Write an mmapped page to the server.
  */
 int nfs_writepage(struct page *page, struct writeback_control *wbc)
@@ -299,6 +331,10 @@ do_it:
 		err = -EBADF;
 		goto out;
 	}
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_writepage_to_fscache(inode, page);
+
 	lock_kernel();
 	if (!IS_SYNC(inode) && inode_referenced) {
 		err = nfs_writepage_async(ctx, inode, page, 0, offset);
--- 2.6.12-rc2-mm3/fs/Kconfig.orig	2005-04-23 10:13:23.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/Kconfig	2005-04-23 11:25:48.000000000 -0400
@@ -1456,6 +1456,13 @@ config NFS_V4
 
 	  If unsure, say N.
 
+config NFS_FSCACHE
+	bool "Provide NFS client caching support (EXPERIMENTAL)"
+	depends on NFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want NFS data to be cached locally on disc through
+	  the general filesystem cache manager
+
 config NFS_DIRECTIO
 	bool "Allow direct I/O on NFS files (EXPERIMENTAL)"
 	depends on NFS_FS && EXPERIMENTAL
--- 2.6.12-rc2-mm3/include/linux/nfs_fs.h.orig	2005-04-23 10:13:28.000000000 -0400
+++ 2.6.12-rc2-mm3/include/linux/nfs_fs.h	2005-04-23 15:27:22.000000000 -0400
@@ -29,6 +29,7 @@
 #include <linux/nfs_xdr.h>
 #include <linux/rwsem.h>
 #include <linux/mempool.h>
+#include <linux/fscache.h>
 
 /*
  * Enable debugging support for nfs client.
@@ -184,6 +185,11 @@ struct nfs_inode {
 	int			 delegation_state;
 	struct rw_semaphore	rwsem;
 #endif /* CONFIG_NFS_V4*/
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;
+#endif
+
 	struct inode		vfs_inode;
 };
 
@@ -564,6 +570,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_FILE		0x0040
 #define NFSDBG_ROOT		0x0080
 #define NFSDBG_CALLBACK		0x0100
+#define NFSDBG_FSCACHE		0x0200
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
--- 2.6.12-rc2-mm3/include/linux/nfs_fs_sb.h.orig	2005-04-23 10:13:28.000000000 -0400
+++ 2.6.12-rc2-mm3/include/linux/nfs_fs_sb.h	2005-04-23 11:25:48.000000000 -0400
@@ -3,6 +3,7 @@
 
 #include <linux/list.h>
 #include <linux/backing-dev.h>
+#include <linux/fscache.h>
 
 /*
  * NFS client parameters stored in the superblock.
@@ -47,6 +48,10 @@ struct nfs_server {
 						   that are supported on this
 						   filesystem */
 #endif
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;	/* cache cookie */
+#endif
 };
 
 /* Server capabilities */
--- 2.6.12-rc2-mm3/include/linux/nfs_mount.h.orig	2005-04-23 10:13:28.000000000 -0400
+++ 2.6.12-rc2-mm3/include/linux/nfs_mount.h	2005-04-23 11:25:48.000000000 -0400
@@ -61,6 +61,7 @@ struct nfs_mount_data {
 #define NFS_MOUNT_NOACL		0x0800	/* 4 */
 #define NFS_MOUNT_STRICTLOCK	0x1000	/* reserved for NFSv4 */
 #define NFS_MOUNT_SECFLAVOUR	0x2000	/* 5 */
+#define NFS_MOUNT_FSCACHE		0x3000
 #define NFS_MOUNT_FLAGMASK	0xFFFF
 
 #endif
--- 2.6.12-rc2-mm3/include/linux/nfs4_mount.h.orig	2005-03-02 02:38:09.000000000 -0500
+++ 2.6.12-rc2-mm3/include/linux/nfs4_mount.h	2005-04-23 11:25:48.000000000 -0400
@@ -65,6 +65,7 @@ struct nfs4_mount_data {
 #define NFS4_MOUNT_NOCTO	0x0010	/* 1 */
 #define NFS4_MOUNT_NOAC		0x0020	/* 1 */
 #define NFS4_MOUNT_STRICTLOCK	0x1000	/* 1 */
+#define NFS4_MOUNT_FSCACHE	0x2000	/* 1 */
 #define NFS4_MOUNT_FLAGMASK	0xFFFF
 
 #endif

[-- Attachment #3: 2.6.12-rc3-mm3-fscache-cookie-exist.patch --]
[-- Type: text/x-patch, Size: 669 bytes --]

Fails a second NFS mount with EEXIST instead of oops. 

Signed-off-by: Steve Dickson <steved@redhat.com>

--- 2.6.12-rc3-mm3/fs/fscache/cookie.c.orig	2005-05-07 09:30:28.000000000 -0400
+++ 2.6.12-rc3-mm3/fs/fscache/cookie.c	2005-05-07 11:01:39.000000000 -0400
@@ -452,7 +452,11 @@ static int fscache_search_for_object(str
 		cache->ops->lock_node(node);
 
 		/* a node should only ever be attached to one cookie */
-		BUG_ON(!list_empty(&node->cookie_link));
+		if (!list_empty(&node->cookie_link)) {
+			cache->ops->unlock_node(node);
+			ret = -EEXIST;
+			goto error;
+		}
 
 		/* attach the node to the cache's node list */
 		if (list_empty(&node->cache_link)) {

[-- Attachment #4: 2.6.12-rc2-mm3-cachefs-wb.patch --]
[-- Type: text/x-patch, Size: 452 bytes --]

--- 2.6.12-rc2-mm3/fs/cachefs/journal.c.save	2005-04-27 08:06:03.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/cachefs/journal.c	2005-05-03 11:11:17.000000000 -0400
@@ -682,6 +682,7 @@ static inline void cachefs_trans_batch_p
 		list_add_tail(&block->batch_link, plist);
 		block->writeback = block->page;
 		get_page(block->writeback);
+		SetPageWriteback(block->writeback);
 
 		/* make sure DMA can reach the data */
 		flush_dcache_page(block->writeback);

[-- Attachment #5: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS Patch for FSCache
  2005-05-09 10:31 NFS Patch for FSCache Steve Dickson
@ 2005-05-09 21:19 ` Andrew Morton
  2005-05-10 18:43   ` Steve Dickson
  2005-05-10 19:12   ` [Linux-cachefs] " David Howells
  2005-06-13 12:52 ` Steve Dickson
  1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2005-05-09 21:19 UTC (permalink / raw)
  To: Steve Dickson; +Cc: linux-fsdevel, linux-cachefs

Steve Dickson <SteveD@redhat.com> wrote:
>
> Attached is a patch that enables NFS to use David Howells'
> File System Caching implementation (FSCache).

Do you have any performance results for this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS Patch for FSCache
  2005-05-09 21:19 ` Andrew Morton
@ 2005-05-10 18:43   ` Steve Dickson
  2005-05-10 19:12   ` [Linux-cachefs] " David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: Steve Dickson @ 2005-05-10 18:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-cachefs

Andrew Morton wrote:
> Steve Dickson <SteveD@redhat.com> wrote:
> 
>>Attached is a patch that enables NFS to use David Howells'
>>File System Caching implementation (FSCache).
> 
> 
> Do you have any performance results for this?
I haven't done any formal performance testing, but from
the functionality testing I've done, I've seen
a ~20% increase in reads speed (verses otw reads).
Mainly due to the fact NFS only needs to do getattrs
and such when the data is cached. But buyer beware...
this a very rough number, so mileage may very. ;-)

I don't have a number for writes, (maybe David does)
but I'm sure there will be a penalty to cache that
data, but its something that can be improve over time.

But the real saving, imho, is the fact those
reads were measured after the filesystem was
umount then remounted. So system wise, there
should be some gain due to the fact that NFS
is not using the network....

steved.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-cachefs] Re: NFS Patch for FSCache
  2005-05-09 21:19 ` Andrew Morton
  2005-05-10 18:43   ` Steve Dickson
@ 2005-05-10 19:12   ` David Howells
  2005-05-14  2:18     ` Troy Benjegerdes
  2005-05-16 13:30     ` David Howells
  1 sibling, 2 replies; 10+ messages in thread
From: David Howells @ 2005-05-10 19:12 UTC (permalink / raw)
  To: Linux filesystem caching discussion list; +Cc: Andrew Morton, linux-fsdevel


Steve Dickson <SteveD@redhat.com> wrote:

> But the real saving, imho, is the fact those reads were measured after the
> filesystem was umount then remounted. So system wise, there should be some
> gain due to the fact that NFS is not using the network....

I tested md5sum read speed also. My testbox is a dual 200MHz PPro. It's got
128MB of RAM. I've got a 100MB file on the NFS server for it to read.

	No Cache:	~14s
	Cold Cache:	~15s
	Warm Cache:	~2s

Now these numbers are approximate because they're from memory.

Note that a cold cache is worse than no cache because CacheFS (a) has to check
the disk before NFS goes to the server, and (b) has to journal the allocations
of new data blocks. It may also have to wait whilst pages are written to disk
before it can get new ones rather than just dropping them (100MB is big enough
wrt 128MB that this will happen) and 100MB is sufficient to cause it to start
using single- and double-indirection pointers to find its blocks on disk,
though these are cached in the page cache.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Linux-cachefs] Re: NFS Patch for FSCache
@ 2005-05-12 22:43   ` Lever, Charles
  2005-05-13 11:17     ` David Howells
  0 siblings, 1 reply; 10+ messages in thread
From: Lever, Charles @ 2005-05-12 22:43 UTC (permalink / raw)
  To: David Howells, SteveD
  Cc: linux-fsdevel, Linux filesystem caching discussion list

preface:  i think this is interesting important work.

> Steve Dickson <SteveD@redhat.com> wrote:
> 
> > But the real saving, imho, is the fact those reads were 
> measured after the
> > filesystem was umount then remounted. So system wise, there 
> should be some
> > gain due to the fact that NFS is not using the network....

i expect to see those gains when either the network and server are
slower than the client's local disk, or when the cached files are
significantly larger than the client's local RAM.  these conditions will
not always be the case, so i'm interested to know how performance is
affected when the system is running outside this sweet spot.

> I tested md5sum read speed also. My testbox is a dual 200MHz 
> PPro. It's got
> 128MB of RAM. I've got a 100MB file on the NFS server for it to read.
> 
> 	No Cache:	~14s
> 	Cold Cache:	~15s
> 	Warm Cache:	~2s
> 
> Now these numbers are approximate because they're from memory.

to benchmark this i think you need to explore the architectural
weaknesses of your approach.  how bad will it get using cachefs with
badly designed applications or client/server setups?

for instance, what happens when the client's cache disk is much slower
than the server (high performance RAID with high speed networking)?
what happens when the client's cache disk fills up so the disk cache is
constantly turning over (which files are kicked out of your backing
cachefs to make room for new data)?  what happens with multi-threaded
I/O-bound applications when the cachefs is on a single spindle?  is
there any performance dependency on the size of the backing cachefs?

do you also cache directory contents on disk?

remember that the application you designed this for (preserving cache
contents across client reboots) is only one way this will be used.  some
of us would like to use this facility to provide a high-performance
local cache larger than the client's RAM.  :^)

> Note that a cold cache is worse than no cache because CacheFS 
> (a) has to check
> the disk before NFS goes to the server, and (b) has to 
> journal the allocations
> of new data blocks. It may also have to wait whilst pages are 
> written to disk
> before it can get new ones rather than just dropping them 
> (100MB is big enough
> wrt 128MB that this will happen) and 100MB is sufficient to 
> cause it to start
> using single- and double-indirection pointers to find its 
> blocks on disk,
> though these are cached in the page cache.

synchronous file system metadata management is the bane of every cachefs
implementation i know about.  have you measured what performance impact
there is when cache files go from no indirection to single indirect
blocks, or from single to double indirection?  have you measured how
expensive it is to reuse a single cache file because the cachefs file
system is already full?  how expensive is it to invalidate the data in
the cache (say, if some other client changes a file you already have
cached in your cachefs)?

what about using an extent-based file system for the backing cachefs?
that would probably not be too difficult because you have a good
prediction already of how large the file will be (just look at the file
size on the server).

how about using smallish chunks, like the AFS cache manager, to avoid
indirection entirely?  would there be any performance advantage to
caching small files in memory and large files on disk, or vice versa?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: NFS Patch for FSCache
  2005-05-10 19:12   ` [Linux-cachefs] " David Howells
@ 2005-05-14  2:18     ` Troy Benjegerdes
  2005-05-16 13:30     ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: Troy Benjegerdes @ 2005-05-14  2:18 UTC (permalink / raw)
  To: Linux filesystem caching discussion list; +Cc: Andrew Morton, linux-fsdevel

On Tue, May 10, 2005 at 08:12:51PM +0100, David Howells wrote:
> 
> Steve Dickson <SteveD@redhat.com> wrote:
> 
> > But the real saving, imho, is the fact those reads were measured after the
> > filesystem was umount then remounted. So system wise, there should be some
> > gain due to the fact that NFS is not using the network....
> 
> I tested md5sum read speed also. My testbox is a dual 200MHz PPro. It's got
> 128MB of RAM. I've got a 100MB file on the NFS server for it to read.
> 
> 	No Cache:	~14s
> 	Cold Cache:	~15s
> 	Warm Cache:	~2s
> 
> Now these numbers are approximate because they're from memory.
> 
> Note that a cold cache is worse than no cache because CacheFS (a) has to check
> the disk before NFS goes to the server, and (b) has to journal the allocations
> of new data blocks. It may also have to wait whilst pages are written to disk
> before it can get new ones rather than just dropping them (100MB is big enough
> wrt 128MB that this will happen) and 100MB is sufficient to cause it to start
> using single- and double-indirection pointers to find its blocks on disk,
> though these are cached in the page cache.

How big was the cachefs filesystem?

Now try reading a 1GB file over nfs..

I have found (with openafs), that I either need a really small cache, or
a really big one.. The bigger the openafs cache gets, the slower it
goes. The only place i run with a > 1GB openafs cache is on an imap
server that has an 8gb cache for maildirs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-cachefs] Re: NFS Patch for FSCache
  2005-05-13 11:17     ` David Howells
@ 2005-05-16 12:47       ` David Howells
  2005-05-18 10:28         ` David Howells
  0 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2005-05-16 12:47 UTC (permalink / raw)
  To: Linux filesystem caching discussion list
  Cc: linux-fsdevel, Lever, Charles, SteveD

Troy Benjegerdes <hozer@hozed.org> wrote:

> I would like to suggest that cache culling be driven by a userspace
> daeomon, with LRU usage being used as a fallback approach if the
> userspace app doesn't respond fast enough. Or at the least provide a way
> to load modules to provide different culling algorithms.

I suppose that shouldn't be too hard; the problem that I can see is deciding
how much space to reserve in an inode slot for culling decision data. For
instance, with the LRU approach, I just keep a 32-bit atime per inode which I
update any time I match that inode.

However, I can see ways in which the culling heuristic might be weighted:

 (*) Favour culling of large files over small ones or small over large.

 (*) Mark files for preferential retention, perhaps by source.

> If the server is responding and delivering files faster than we can
> write them to local disk and cull space, should we really be caching at
> all? Is it even appropriate for the kernel to make that decision?

That's a very tricky question, and it's most likely when the network + server
retrieval speeds are better than the disk retrieval speeds, in which case you
shouldn't be using a cache, except for the cases of where you want to be able
to live without the server for some reason or other; and there the cache is
being used to enhance reliability, not speed.

However, we do need to control turnover on the cache. If the rate is greater
than we can sustain, there needs to be a way to suspend caching on certain
files, but what the heuristic for that should be, I'm not sure.

CacheFS on one very large file nets me something like a 700% performance
increase[*] on an old dual-PPro machine with a warm cache on a 7200rpm HDD vs a
100Mbps network link to the server. The file is larger than the machine's
memory. I believe Steve Dickson doesn't really see any advantage of using the
cache with a 1Gbps network link to his server.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: NFS Patch for FSCache
  2005-05-10 19:12   ` [Linux-cachefs] " David Howells
  2005-05-14  2:18     ` Troy Benjegerdes
@ 2005-05-16 13:30     ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: David Howells @ 2005-05-16 13:30 UTC (permalink / raw)
  To: Linux filesystem caching discussion list; +Cc: Andrew Morton, linux-fsdevel


Troy Benjegerdes <hozer@hozed.org> wrote:

> How big was the cachefs filesystem?

Several Gig. I don't remember how big, but the disk I tried it on is totally
kaput unfortunately.

> Now try reading a 1GB file over nfs..

I'll give that a go at some point. However, I suspect that any size over twice
the amount of pagecache available is going to scale fairly consistently until
you start hitting the lid on the cache. I say twice because firstly you fill
the pagecache with pages and start throwing them at the disk, and then you
have to start on a rolling process of waiting for those to hit the disk before
evicting them from the pagecache, which isn't going to get going smoothly
until you've ejected the original load of pages.

> I have found (with openafs), that I either need a really small cache, or
> a really big one.. The bigger the openafs cache gets, the slower it
> goes. The only place i run with a > 1GB openafs cache is on an imap
> server that has an 8gb cache for maildirs.

What filesystem underlies your OpenAFS cache? OpenAFS doesn't actually do its
own file to disk management within the cache, but uses a host filesystem for
that.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linux-cachefs] Re: NFS Patch for FSCache
  2005-05-16 12:47       ` [Linux-cachefs] " David Howells
@ 2005-05-18 10:28         ` David Howells
  0 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2005-05-18 10:28 UTC (permalink / raw)
  To: Linux filesystem caching discussion list
  Cc: linux-fsdevel, Lever, Charles, SteveD

David Masover <ninja@slaphack.com> wrote:

> Does the cache call sync/fsync overly often?

Not at all.

> If not, we can gain something by using an underlying FS with lazy writes.

Yes, to some extent. There's still the problem of filesystem integrity to deal
with, and lazy writes hold up journal closure. This isn't necessarily a
problem, except when you want to delete and launder a block that has a write
hanging over it. It's not unsolvable, just tricky.

Besides, what do you mean by lazy?

Also consider: you probably want to start netfs data writes as soon as
possible as not having cached the page yet restricts the netfs's activities on
that page; but you want to defer metadata writes as long as possible because
they may become obsolete, it may be possible to batch them and it may be
possible to merge them.

> I think the caching should be done asynchronously.  As stuff comes in,
> it should be handed off both to the app requesting it and to a queue to
> write it to the cache.  If the queue gets too full, start dropping stuff
> from it the same way you do from cache -- probably LRU or LFU or
> something similar.

That's not a bad idea; we need a rate limit on throwing stuff at the cache in
the situation where there's not much disk space available.

Actually, probably the biggest bottleneck is the disk block allocator. Given
that I'm using lists of free blocks, it's difficult to place a tentative
reservation on a block, and it very much favours allocating blocks for one
transaction at a time. However, free lists make block recycling a lot easier.

I could use a bitmap instead; but that requires every block allocated or
deleted be listed in the journal. Not only that but it complicates deletion
and journal replay. Also, under worst case conditions it's really nasty
because you could end up with a situation where you've got one a whole set of
bitmaps, each with one free block; that means you've got to read a whole lot
of bitmaps to allocate the blocks you require, and you have to modify several
of them to seal an allocation. Furthermore, you end up losing a chunk of space
statically allocated to the maintenance of these things, unless you want to
allocate the bitmaps dynamically also...

> Another question -- how much performance do we lose by caching, assuming
> that both the network/server and the local disk are infinitely fast?
> That is, how many cycles do we lose vs. local disk access?  Basically,
> I'm looking for something that does what InterMezzo was supposed to --
> make cache access almost as fast as local access, so that I can replace
> all local stuff with a cache.

Well, with infinitely fast disk and network, very little - you can afford to
be profligate on your turnover of disk space, and this affects the options you
might choose in designing your cache.

The real-world case is more interesting as you have to compromise. With
CacheFS as it stands, it attempts not to lose any data blocks, and it attempts
not to return uninitialised data, and these two constraints work counter to
each other. There's a second journal (the validity journal) to record blocks
that have been allocated but that don't yet have data stored therein. This
permits advance allocation, but requires a second update journal entry to
clear the validity journal entry after the data has been stored. It also
requires the validity journal to be replayed upon mounting.

Reading one really big file (bigger than the memory available) over AFS, with
a cold cache it took very roughly 107% of the time it took with no cache; but
using a warm cache, it took 14% of the time it took with no cache. However,
this is on my particular test box, and it varies a lot from box to box.

This doesn't really demonstrate the latency on indexing, however; that we have
to do before we even consider touching the network. I don't have numbers on
that, but in the worst case they're going to be quite bad.

I'm currently working on mark II CacheFS, using a wandering tree to maintain
the index. I'm not entirely sure whether I want to include the data pointers
in this tree. There are advantages to doing so: namely that I can use the same
tree maintenance routines for everything, but also disadvantages: namely that
it complicates deletion a lot.

Using a wandering tree will cut the latency on index lookups (because it's a
tree), and simplify journalling (wandering) and mean I can just grab a block,
write to it and then connect it (wandering). Block allocation is still
unpleasant though...

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NFS Patch for FSCache
  2005-05-09 10:31 NFS Patch for FSCache Steve Dickson
  2005-05-09 21:19 ` Andrew Morton
@ 2005-06-13 12:52 ` Steve Dickson
  1 sibling, 0 replies; 10+ messages in thread
From: Steve Dickson @ 2005-06-13 12:52 UTC (permalink / raw)
  To: Linux filesystem caching discussion list
  Cc: Andrew Morton, linux-fsdevel, Trond Myklebust

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

I notice that a number NFS patches when into
2.6.12-rc6-mm1 so I wanted to make sure the patches I
posted, which allow NFS to use cachefs, had not
become stale. It turns out they didn't and they
still work as they did in rc3-mm3... But I figured
I would repost them anyways in hopes to get them
reviewed and accepted in the -mm tree....


steved.



[-- Attachment #2: 2.6.12-rc6-mm1-nfs-fscache.patch --]
[-- Type: text/x-patch, Size: 26574 bytes --]

This patch enables NFS to use file system caching (i.e. FSCache).
To turn this feature on you must specifiy the -o fsc mount flag
as well as have a cachefs partition mounted. 

Signed-off-by: Steve Dickson <steved@redhat.com>


--- 2.6.12-rc5-mm2/fs/nfs/file.c.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/file.c	2005-06-05 11:44:48.000000000 -0400
@@ -27,9 +27,11 @@
 #include <linux/slab.h>
 #include <linux/pagemap.h>
 #include <linux/smp_lock.h>
+#include <linux/buffer_head.h>
 
 #include <asm/uaccess.h>
 #include <asm/system.h>
+#include "nfs-fscache.h"
 
 #include "delegation.h"
 
@@ -194,6 +196,12 @@ nfs_file_sendfile(struct file *filp, lof
 	return res;
 }
 
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+
 static int
 nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
 {
@@ -207,6 +215,10 @@ nfs_file_mmap(struct file * file, struct
 	status = nfs_revalidate_inode(NFS_SERVER(inode), inode);
 	if (!status)
 		status = generic_file_mmap(file, vma);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		vma->vm_ops->page_mkwrite = nfs_file_page_mkwrite;
+
 	return status;
 }
 
@@ -258,6 +270,11 @@ static int nfs_commit_write(struct file 
 	return status;
 }
 
+/*
+ * since we use page->private for our own nefarious purposes when using fscache, we have to
+ * override extra address space ops to prevent fs/buffer.c from getting confused, even though we
+ * may not have asked its opinion
+ */
 struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -269,6 +286,11 @@ struct address_space_operations nfs_file
 #ifdef CONFIG_NFS_DIRECTIO
 	.direct_IO = nfs_direct_IO,
 #endif
+#ifdef CONFIG_NFS_FSCACHE
+	.sync_page	= block_sync_page,
+	.releasepage	= nfs_releasepage,
+	.invalidatepage	= nfs_invalidatepage,
+#endif
 };
 
 /* 
--- 2.6.12-rc5-mm2/fs/nfs/inode.c.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/inode.c	2005-06-05 11:44:48.000000000 -0400
@@ -42,6 +42,8 @@
 #include "nfs4_fs.h"
 #include "delegation.h"
 
+#include "nfs-fscache.h"
+
 #define NFSDBG_FACILITY		NFSDBG_VFS
 #define NFS_PARANOIA 1
 
@@ -169,6 +171,10 @@ nfs_clear_inode(struct inode *inode)
 	cred = nfsi->cache_access.cred;
 	if (cred)
 		put_rpccred(cred);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_clear_fscookie(nfsi);
+
 	BUG_ON(atomic_read(&nfsi->data_updates) != 0);
 }
 
@@ -503,6 +509,9 @@ nfs_fill_super(struct super_block *sb, s
 			server->namelen = NFS2_MAXNAMLEN;
 	}
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_fill_fscookie(sb);
+
 	sb->s_op = &nfs_sops;
 	return nfs_sb_init(sb, authflavor);
 }
@@ -579,6 +588,7 @@ static int nfs_show_options(struct seq_f
 		{ NFS_MOUNT_NOAC, ",noac", "" },
 		{ NFS_MOUNT_NONLM, ",nolock", ",lock" },
 		{ NFS_MOUNT_NOACL, ",noacl", "" },
+		{ NFS_MOUNT_FSCACHE, ",fscache", "" },
 		{ 0, NULL, NULL }
 	};
 	struct proc_nfs_info *nfs_infop;
@@ -623,6 +633,9 @@ nfs_zap_caches(struct inode *inode)
 		nfsi->flags |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
 	else
 		nfsi->flags |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_zap_fscookie(nfsi);
 }
 
 static void nfs_zap_acl_cache(struct inode *inode)
@@ -770,6 +783,9 @@ nfs_fhget(struct super_block *sb, struct
 		memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
 		nfsi->cache_access.cred = NULL;
 
+		if (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)
+			nfs_fhget_fscookie(sb, nfsi);
+
 		unlock_new_inode(inode);
 	} else
 		nfs_refresh_inode(inode, fattr);
@@ -1076,6 +1092,9 @@ __nfs_revalidate_inode(struct nfs_server
 				(long long)NFS_FILEID(inode));
 		/* This ensures we revalidate dentries */
 		nfsi->cache_change_attribute++;
+
+		if (server->flags & NFS_MOUNT_FSCACHE)
+			nfs_renew_fscookie(server, nfsi);
 	}
 	if (flags & NFS_INO_INVALID_ACL)
 		nfs_zap_acl_cache(inode);
@@ -1515,6 +1534,14 @@ static struct super_block *nfs_get_sb(st
 		goto out_err;
 	}
 
+#ifndef CONFIG_NFS_FSCACHE
+	if (data->flags & NFS_MOUNT_FSCACHE) {
+		printk(KERN_WARNING "NFS: kernel not compiled with CONFIG_NFS_FSCACHE\n");
+		kfree(server);
+		return ERR_PTR(-EINVAL);
+	}
+#endif
+
 	s = sget(fs_type, nfs_compare_super, nfs_set_super, server);
 	if (IS_ERR(s) || s->s_root)
 		goto out_rpciod_down;
@@ -1542,6 +1569,9 @@ static void nfs_kill_super(struct super_
 
 	kill_anon_super(s);
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_kill_fscookie(server);
+
 	if (server->client != NULL && !IS_ERR(server->client))
 		rpc_shutdown_client(server->client);
 	if (server->client_sys != NULL && !IS_ERR(server->client_sys))
@@ -1760,6 +1790,9 @@ static int nfs4_fill_super(struct super_
 
 	sb->s_time_gran = 1;
 
+	if (server->flags & NFS4_MOUNT_FSCACHE)
+		nfs4_fill_fscookie(sb);
+
 	sb->s_op = &nfs4_sops;
 	err = nfs_sb_init(sb, authflavour);
 	if (err == 0)
@@ -1903,6 +1936,9 @@ static void nfs4_kill_super(struct super
 	nfs_return_all_delegations(sb);
 	kill_anon_super(sb);
 
+	if (server->flags & NFS_MOUNT_FSCACHE)
+		nfs_kill_fscookie(server);
+
 	nfs4_renewd_prepare_shutdown(server);
 
 	if (server->client != NULL && !IS_ERR(server->client))
@@ -2021,6 +2057,11 @@ static int __init init_nfs_fs(void)
 {
 	int err;
 
+	/* we want to be able to cache */
+	err = nfs_register_netfs();
+	if (err < 0)
+		goto out5;
+
 	err = nfs_init_nfspagecache();
 	if (err)
 		goto out4;
@@ -2068,6 +2109,9 @@ out2:
 out3:
 	nfs_destroy_nfspagecache();
 out4:
+	nfs_unregister_netfs();
+out5:
+
 	return err;
 }
 
@@ -2080,6 +2124,7 @@ static void __exit exit_nfs_fs(void)
 	nfs_destroy_readpagecache();
 	nfs_destroy_inodecache();
 	nfs_destroy_nfspagecache();
+	nfs_unregister_netfs();
 #ifdef CONFIG_PROC_FS
 	rpc_proc_unregister("nfs");
 #endif
--- 2.6.12-rc5-mm2/fs/nfs/Makefile.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/Makefile	2005-06-05 11:44:48.000000000 -0400
@@ -13,4 +13,5 @@ nfs-$(CONFIG_NFS_V4)	+= nfs4proc.o nfs4x
 			   delegation.o idmap.o \
 			   callback.o callback_xdr.o callback_proc.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
+nfs-$(CONFIG_NFS_FSCACHE) += nfs-fscache.o
 nfs-objs		:= $(nfs-y)
--- /dev/null	2005-06-05 03:42:13.591137792 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/nfs-fscache.c	2005-06-05 11:44:48.000000000 -0400
@@ -0,0 +1,191 @@
+/* nfs-fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/config.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+
+#include "nfs-fscache.h"
+
+#define NFS_CACHE_FH_INDEX_SIZE sizeof(struct nfs_fh)
+
+/*
+ * the root index is
+ */
+static struct fscache_page *nfs_cache_get_page_token(struct page *page);
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+	.get_page_token	= nfs_cache_get_page_token,
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+	.name			= "nfs",
+	.version		= 0,
+	.ops			= &nfs_cache_ops,
+};
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+static fscache_match_val_t nfs_cache_server_match(void *target,
+						  const void *entry);
+static void nfs_cache_server_update(void *source, void *entry);
+
+struct fscache_index_def nfs_cache_server_index_def = {
+	.name			= "servers",
+	.data_size		= 18,
+	.keys[0]		= { FSCACHE_INDEX_KEYS_IPV6ADDR, 16 },
+	.keys[1]		= { FSCACHE_INDEX_KEYS_BIN, 2 },
+	.match			= nfs_cache_server_match,
+	.update			= nfs_cache_server_update,
+};
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+static fscache_match_val_t nfs_cache_fh_match(void *target, const void *entry);
+static void nfs_cache_fh_update(void *source, void *entry);
+
+struct fscache_index_def nfs_cache_fh_index_def = {
+	.name			= "fh",
+	.data_size		= NFS_CACHE_FH_INDEX_SIZE,
+	.keys[0]		= { FSCACHE_INDEX_KEYS_BIN_SZ2,
+				    sizeof(struct nfs_fh) },
+	.match			= nfs_cache_fh_match,
+	.update			= nfs_cache_fh_update,
+};
+
+/*
+ * get a page token for the specified page
+ * - the token will be attached to page->private and PG_private will be set on
+ *   the page
+ */
+static struct fscache_page *nfs_cache_get_page_token(struct page *page)
+{
+	return fscache_page_get_private(page, GFP_NOIO);
+}
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+	[0 ... 9]	= 0x00,
+	[10 ... 11]	= 0xff
+};
+
+/*
+ * match a server record obtained from the cache
+ */
+static fscache_match_val_t nfs_cache_server_match(void *target,
+						  const void *entry)
+{
+	struct nfs_server *server = target;
+	const uint8_t *data = entry;
+
+	switch (server->addr.sin_family) {
+	case AF_INET:
+		if (memcmp(data + 0,
+			   &nfs_cache_ipv6_wrapper_for_ipv4,
+			   12) != 0)
+			break;
+
+		if (memcmp(data + 12, &server->addr.sin_addr, 4) != 0)
+			break;
+
+		if (memcmp(data + 16, &server->addr.sin_port, 2) != 0)
+			break;
+
+		return FSCACHE_MATCH_SUCCESS;
+
+	case AF_INET6:
+		if (memcmp(data + 0, &server->addr.sin_addr, 16) != 0)
+			break;
+
+		if (memcmp(data + 16, &server->addr.sin_port, 2) != 0)
+			break;
+
+		return FSCACHE_MATCH_SUCCESS;
+
+	default:
+		break;
+	}
+
+	return FSCACHE_MATCH_FAILED;
+}
+
+/*
+ * update a server record in the cache
+ */
+static void nfs_cache_server_update(void *source, void *entry)
+{
+	struct nfs_server *server = source;
+	uint8_t *data = entry;
+
+	switch (server->addr.sin_family) {
+	case AF_INET:
+		memcpy(data + 0, &nfs_cache_ipv6_wrapper_for_ipv4, 12);
+		memcpy(data + 12, &server->addr.sin_addr, 4);
+		memcpy(data + 16, &server->addr.sin_port, 2);
+		return;
+
+	case AF_INET6:
+		memcpy(data + 0, &server->addr.sin_addr, 16);
+		memcpy(data + 16, &server->addr.sin_port, 2);
+		return;
+
+	default:
+		return;
+	}
+}
+
+/*
+ * match a file handle record obtained from the cache
+ */
+static fscache_match_val_t nfs_cache_fh_match(void *target, const void *entry)
+{
+	struct nfs_inode *nfsi = target;
+	const uint8_t *data = entry;
+	uint16_t nsize;
+
+	/* check the file handle matches */
+	memcpy(&nsize, data, 2);
+	nsize = ntohs(nsize);
+
+	if (nsize <= NFS_CACHE_FH_INDEX_SIZE && nfsi->fh.size == nsize) {
+		if (memcmp(data + 2, nfsi->fh.data, nsize) == 0) {
+			return FSCACHE_MATCH_SUCCESS;
+		}
+	}
+
+	return FSCACHE_MATCH_FAILED;
+}
+
+/*
+ * update a fh record in the cache
+ */
+static void nfs_cache_fh_update(void *source, void *entry)
+{
+	struct nfs_inode *nfsi = source;
+	uint16_t nsize;
+	uint8_t *data = entry;
+
+	BUG_ON(nfsi->fh.size > NFS_CACHE_FH_INDEX_SIZE - 2);
+
+	/* set the file handle */
+	nsize = htons(nfsi->fh.size);
+	memcpy(data, &nsize, 2);
+	memcpy(data + 2, &nfsi->fh.data, nfsi->fh.size);
+	memset(data + 2 + nfsi->fh.size,
+	       FSCACHE_INDEX_DEADFILL_PATTERN,
+	       NFS_CACHE_FH_INDEX_SIZE - 2 - nfsi->fh.size);
+}
--- /dev/null	2005-06-05 03:42:13.591137792 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/nfs-fscache.h	2005-06-05 11:44:48.000000000 -0400
@@ -0,0 +1,158 @@
+/* nfs-fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+#include <linux/fscache.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#ifndef CONFIG_FSCACHE
+#error "CONFIG_NFS_FSCACHE is defined but not CONFIG_FSCACHE"
+#endif
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_index_def nfs_cache_server_index_def;
+extern struct fscache_index_def nfs_cache_fh_index_def;
+
+extern int nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, int);
+extern int nfs_mkwrite(struct page *);
+
+static inline void 
+nfs_renew_fscookie(struct nfs_server *server, struct nfs_inode *nfsi)
+{
+	struct fscache_cookie *old =  nfsi->fscache;
+
+	/* retire the current fscache cache and get a new one */
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = fscache_acquire_cookie(server->fscache, NULL, nfsi);
+
+	dfprintk(FSCACHE,
+		"NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+		server, nfsi, old, nfsi->fscache);
+
+	return;
+}
+static inline void
+nfs4_fill_fscookie(struct super_block *sb)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	/* create a cache index for looking up filehandles */
+	server->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+			       &nfs_cache_fh_index_def, server);
+	if (server->fscache == NULL) {
+		printk(KERN_WARNING "NFS4: No Fscache cookie. Turning Fscache off!\n");
+	} else /* reuse the NFS mount option */
+		server->flags |= NFS_MOUNT_FSCACHE;
+
+	dfprintk(FSCACHE,"NFS: nfs4 cookie (0x%p,0x%p/0x%p)\n", 
+		sb, server, server->fscache);
+
+	return;
+}
+static inline void
+nfs_fill_fscookie(struct super_block *sb)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	/* create a cache index for looking up filehandles */
+	server->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+			      &nfs_cache_fh_index_def, server);
+	if (server->fscache == NULL) {
+		server->flags &= ~NFS_MOUNT_FSCACHE;
+		printk(KERN_WARNING "NFS: No Fscache cookie. Turning Fscache off!\n");
+	}
+	dfprintk(FSCACHE,"NFS: cookie (0x%p/0x%p/0x%p)\n", 
+		sb, server, server->fscache);
+
+	return;
+}
+static inline void
+nfs_fhget_fscookie(struct super_block *sb, struct nfs_inode *nfsi)
+{
+	struct nfs_server *server = NFS_SB(sb);
+
+	nfsi->fscache = fscache_acquire_cookie(server->fscache, NULL, nfsi);
+	if (server->fscache == NULL)
+		printk(KERN_WARNING "NFS: NULL FScache cookie: sb 0x%p nfsi 0x%p\n", sb, nfsi);
+
+	dfprintk(FSCACHE, "NFS: fhget new cookie (0x%p/0x%p/0x%p)\n",
+		sb, nfsi, nfsi->fscache);
+
+	return;
+}
+static inline void 
+nfs_kill_fscookie(struct nfs_server *server)
+{
+	dfprintk(FSCACHE,"NFS: killing cookie (0x%p/0x%p)\n",
+		server, server->fscache);
+
+	fscache_relinquish_cookie(server->fscache, 0);
+	server->fscache = NULL;
+
+	return;
+}
+static inline void
+nfs_clear_fscookie(struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+			nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 0);
+	nfsi->fscache = NULL;
+
+	return;
+}
+static inline void
+nfs_zap_fscookie(struct nfs_inode *nfsi)
+{
+	dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+		nfsi, nfsi->fscache);
+
+	fscache_relinquish_cookie(nfsi->fscache, 1);
+	nfsi->fscache = NULL;
+
+	return;
+}
+static inline int
+nfs_register_netfs(void)
+{
+	int err;
+
+	err = fscache_register_netfs(&nfs_cache_netfs, &nfs_cache_server_index_def);
+
+	return err;
+}
+static inline void
+nfs_unregister_netfs(void)
+{
+	fscache_unregister_netfs(&nfs_cache_netfs);
+
+	return;
+}
+#else
+static inline void nfs_fill_fscookie(struct super_block *sb) {}
+static inline void nfs_fhget_fscookie(struct super_block *sb, struct nfs_inode *nfsi) {}
+static inline void nfs4_fill_fscookie(struct super_block *sb) {}
+static inline void nfs_kill_fscookie(struct nfs_server *server) {}
+static inline void nfs_clear_fscookie(struct nfs_inode *nfsi) {}
+static inline void nfs_zap_fscookie(struct nfs_inode *nfsi) {}
+static inline void 
+	nfs_renew_fscookie(struct nfs_server *server, struct nfs_inode *nfsi) {}
+static inline int nfs_register_netfs() { return 0; }
+static inline void nfs_unregister_netfs() {}
+
+#endif
+#endif /* _NFS_FSCACHE_H */
--- 2.6.12-rc5-mm2/fs/nfs/read.c.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/read.c	2005-06-05 11:44:48.000000000 -0400
@@ -27,6 +27,7 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
 #include <linux/smp_lock.h>
 
 #include <asm/system.h>
@@ -73,6 +74,47 @@ int nfs_return_empty_page(struct page *p
 	return 0;
 }
 
+#ifdef CONFIG_NFS_FSCACHE
+/*
+ * store a newly fetched page in fscache
+ */
+static void
+nfs_readpage_to_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	dprintk("NFS:     readpage_to_fscache_complete (%p/%p/%p/%d)\n",
+		cookie_data, page, data, error);
+
+	end_page_fs_misc(page);
+}
+
+static inline void
+nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
+{
+	int ret;
+
+	dprintk("NFS: readpage_to_fscache(0x%p/0x%p/0x%p/%d)\n",
+		NFS_I(inode)->fscache, page, inode, sync);
+
+	SetPageFsMisc(page);
+	ret = fscache_write_page(NFS_I(inode)->fscache, page,
+		nfs_readpage_to_fscache_complete, NULL, GFP_KERNEL);
+	if (ret != 0) {
+		dprintk("NFS:     readpage_to_fscache: error %d\n", ret);
+		fscache_uncache_page(NFS_I(inode)->fscache, page);
+		ClearPageFsMisc(page);
+	}
+
+	unlock_page(page);
+}
+#else
+static inline void
+nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
+{
+	BUG();
+}
+#endif
+
+
 /*
  * Read a page synchronously.
  */
@@ -149,6 +191,13 @@ static int nfs_readpage_sync(struct nfs_
 		ClearPageError(page);
 	result = 0;
 
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_readpage_to_fscache(inode, page, 1);
+	else
+		unlock_page(page);
+
+	return result;
+
 io_error:
 	unlock_page(page);
 	nfs_readdata_free(rdata);
@@ -180,7 +229,13 @@ static int nfs_readpage_async(struct nfs
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
-	unlock_page(req->wb_page);
+	struct inode *d_inode = req->wb_context->dentry->d_inode;
+
+	if ((NFS_SERVER(d_inode)->flags & NFS_MOUNT_FSCACHE) && 
+			PageUptodate(req->wb_page))
+		nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+	else
+		unlock_page(req->wb_page);
 
 	nfs_clear_request(req);
 	nfs_release_request(req);
@@ -477,6 +532,67 @@ void nfs_readpage_result(struct rpc_task
 	data->complete(data, status);
 }
 
+
+/*
+ * Read a page through the on-disc cache if possible
+ */
+#ifdef CONFIG_NFS_FSCACHE
+static void
+nfs_readpage_from_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	dprintk("NFS: readpage_from_fscache_complete (0x%p/0x%p/0x%p/%d)\n",
+		cookie_data, page, data, error);
+
+	if (error)
+		SetPageError(page);
+	else
+		SetPageUptodate(page);
+
+	unlock_page(page);
+}
+
+static inline int
+nfs_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+	struct fscache_page *pageio;
+	int ret;
+
+	dprintk("NFS: readpage_from_fscache(0x%p/0x%p/0x%p)\n",
+		NFS_I(inode)->fscache, page, inode);
+
+	pageio = fscache_page_get_private(page, GFP_NOIO);
+	if (IS_ERR(pageio)) {
+		 dprintk("NFS:     fscache_page_get_private error %ld\n", PTR_ERR(pageio));
+		return PTR_ERR(pageio);
+	}
+
+	ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+					 page,
+					 nfs_readpage_from_fscache_complete,
+					 NULL,
+					 GFP_KERNEL);
+
+	switch (ret) {
+	case 1: /* read BIO submitted and wb-journal entry found */
+		BUG();
+		
+	case 0: /* read BIO submitted (page in fscache) */
+		return ret;
+
+	case -ENOBUFS: /* inode not in cache */
+	case -ENODATA: /* page not in cache */
+		dprintk("NFS:     fscache_read_or_alloc_page error %d\n", ret);
+		return 1;
+
+	default:
+		return ret;
+	}
+}
+#else
+static inline int 
+nfs_readpage_from_fscache(struct inode *inode, struct page *page) { return 1; }
+#endif
+
 /*
  * Read a page over NFS.
  * We read the page synchronously in the following case:
@@ -510,6 +626,13 @@ int nfs_readpage(struct file *file, stru
 		ctx = get_nfs_open_context((struct nfs_open_context *)
 				file->private_data);
 	if (!IS_SYNC(inode)) {
+		if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE) {
+			error = nfs_readpage_from_fscache(inode, page);
+			if (error < 0)
+				goto out_error;
+			if (error == 0)
+				return error;
+		}
 		error = nfs_readpage_async(ctx, inode, page);
 		goto out;
 	}
@@ -540,6 +663,15 @@ readpage_async_filler(void *data, struct
 	unsigned int len;
 
 	nfs_wb_page(inode, page);
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE) {
+		int error = nfs_readpage_from_fscache(inode, page);
+		if (error < 0)
+			return error;
+		if (error == 0)
+			return error;
+	}
+
 	len = nfs_page_length(inode, page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
@@ -613,3 +745,61 @@ void nfs_destroy_readpagecache(void)
 	if (kmem_cache_destroy(nfs_rdata_cachep))
 		printk(KERN_INFO "nfs_read_data: not all structures were freed\n");
 }
+
+#ifdef CONFIG_NFS_FSCACHE
+int nfs_invalidatepage(struct page *page, unsigned long offset)
+{
+	int ret = 1;
+	struct nfs_server *server = NFS_SERVER(page->mapping->host);
+
+	BUG_ON(!PageLocked(page));
+
+	if (server->flags & NFS_MOUNT_FSCACHE) {
+		if (PagePrivate(page)) {
+			struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+			dfprintk(PAGECACHE,"NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+				nfsi->fscache, page, nfsi);
+
+			fscache_uncache_page(nfsi->fscache, page);
+
+			if (offset == 0) {
+				BUG_ON(!PageLocked(page));
+				ret = 0;
+				if (!PageWriteback(page))
+					ret = page->mapping->a_ops->releasepage(page, 0);
+			}
+		}
+	} else
+		ret = 0;
+
+	return ret;
+}
+int nfs_releasepage(struct page *page, int gfp_flags)
+{
+	struct fscache_page *pageio;
+	struct nfs_server *server = NFS_SERVER(page->mapping->host);
+
+	if (server->flags & NFS_MOUNT_FSCACHE && PagePrivate(page)) {
+		struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+		dfprintk(PAGECACHE,"NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+			nfsi->fscache, page, nfsi);
+
+		fscache_uncache_page(nfsi->fscache, page);
+		pageio = (struct fscache_page *) page->private;
+		page->private = 0;
+		ClearPagePrivate(page);
+
+		if (pageio)
+			kfree(pageio);
+	}
+
+	return 0;
+}
+int nfs_mkwrite(struct page *page)
+{
+	wait_on_page_fs_misc(page);
+	return 0;
+}
+#endif
--- 2.6.12-rc5-mm2/fs/nfs/write.c.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/nfs/write.c	2005-06-05 11:44:48.000000000 -0400
@@ -255,6 +255,38 @@ static int wb_priority(struct writeback_
 }
 
 /*
+ * store an updated page in fscache
+ */
+#ifdef CONFIG_NFS_FSCACHE
+static void
+nfs_writepage_to_fscache_complete(void *cookie_data, struct page *page, void *data, int error)
+{
+	/* really need to synchronise the end of writeback, probably using a page flag */
+}
+static inline void
+nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	int ret; 
+
+	dprintk("NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
+		NFS_I(inode)->fscache, page, inode);
+
+	ret =  fscache_write_page(NFS_I(inode)->fscache, page,
+		nfs_writepage_to_fscache_complete, NULL, GFP_KERNEL);
+	if (ret != 0) {
+		dprintk("NFS:    fscache_write_page error %d\n", ret);
+		fscache_uncache_page(NFS_I(inode)->fscache, page);
+	}
+}
+#else
+static inline void
+nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+	BUG();
+}
+#endif
+
+/*
  * Write an mmapped page to the server.
  */
 int nfs_writepage(struct page *page, struct writeback_control *wbc)
@@ -299,6 +331,10 @@ do_it:
 		err = -EBADF;
 		goto out;
 	}
+
+	if (NFS_SERVER(inode)->flags & NFS_MOUNT_FSCACHE)
+		nfs_writepage_to_fscache(inode, page);
+
 	lock_kernel();
 	if (!IS_SYNC(inode) && inode_referenced) {
 		err = nfs_writepage_async(ctx, inode, page, 0, offset);
--- 2.6.12-rc5-mm2/fs/Kconfig.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/fs/Kconfig	2005-06-05 11:44:48.000000000 -0400
@@ -1495,6 +1495,13 @@ config NFS_V4
 
 	  If unsure, say N.
 
+config NFS_FSCACHE
+	bool "Provide NFS client caching support (EXPERIMENTAL)"
+	depends on NFS_FS && FSCACHE && EXPERIMENTAL
+	help
+	  Say Y here if you want NFS data to be cached locally on disc through
+	  the general filesystem cache manager
+
 config NFS_DIRECTIO
 	bool "Allow direct I/O on NFS files (EXPERIMENTAL)"
 	depends on NFS_FS && EXPERIMENTAL
--- 2.6.12-rc5-mm2/include/linux/nfs_fs.h.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/include/linux/nfs_fs.h	2005-06-05 11:44:48.000000000 -0400
@@ -29,6 +29,7 @@
 #include <linux/nfs_xdr.h>
 #include <linux/rwsem.h>
 #include <linux/mempool.h>
+#include <linux/fscache.h>
 
 /*
  * Enable debugging support for nfs client.
@@ -184,6 +185,11 @@ struct nfs_inode {
 	int			 delegation_state;
 	struct rw_semaphore	rwsem;
 #endif /* CONFIG_NFS_V4*/
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;
+#endif
+
 	struct inode		vfs_inode;
 };
 
@@ -564,6 +570,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_FILE		0x0040
 #define NFSDBG_ROOT		0x0080
 #define NFSDBG_CALLBACK		0x0100
+#define NFSDBG_FSCACHE		0x0200
 #define NFSDBG_ALL		0xFFFF
 
 #ifdef __KERNEL__
--- 2.6.12-rc5-mm2/include/linux/nfs_fs_sb.h.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/include/linux/nfs_fs_sb.h	2005-06-05 11:44:48.000000000 -0400
@@ -3,6 +3,7 @@
 
 #include <linux/list.h>
 #include <linux/backing-dev.h>
+#include <linux/fscache.h>
 
 /*
  * NFS client parameters stored in the superblock.
@@ -47,6 +48,10 @@ struct nfs_server {
 						   that are supported on this
 						   filesystem */
 #endif
+
+#ifdef CONFIG_NFS_FSCACHE
+	struct fscache_cookie	*fscache;	/* cache cookie */
+#endif
 };
 
 /* Server capabilities */
--- 2.6.12-rc5-mm2/include/linux/nfs_mount.h.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/include/linux/nfs_mount.h	2005-06-05 11:44:48.000000000 -0400
@@ -61,6 +61,7 @@ struct nfs_mount_data {
 #define NFS_MOUNT_NOACL		0x0800	/* 4 */
 #define NFS_MOUNT_STRICTLOCK	0x1000	/* reserved for NFSv4 */
 #define NFS_MOUNT_SECFLAVOUR	0x2000	/* 5 */
+#define NFS_MOUNT_FSCACHE		0x3000
 #define NFS_MOUNT_FLAGMASK	0xFFFF
 
 #endif
--- 2.6.12-rc5-mm2/include/linux/nfs4_mount.h.orig	2005-06-05 11:44:35.000000000 -0400
+++ 2.6.12-rc5-mm2/include/linux/nfs4_mount.h	2005-06-05 11:44:48.000000000 -0400
@@ -65,6 +65,7 @@ struct nfs4_mount_data {
 #define NFS4_MOUNT_NOCTO	0x0010	/* 1 */
 #define NFS4_MOUNT_NOAC		0x0020	/* 1 */
 #define NFS4_MOUNT_STRICTLOCK	0x1000	/* 1 */
+#define NFS4_MOUNT_FSCACHE	0x2000	/* 1 */
 #define NFS4_MOUNT_FLAGMASK	0xFFFF
 
 #endif

[-- Attachment #3: 2.6.12-rc6-mm1-fscache-cookie-exist.patch --]
[-- Type: text/x-patch, Size: 669 bytes --]

Fails a second NFS mount with EEXIST instead of oops. 

Signed-off-by: Steve Dickson <steved@redhat.com>

--- 2.6.12-rc3-mm3/fs/fscache/cookie.c.orig	2005-05-07 09:30:28.000000000 -0400
+++ 2.6.12-rc3-mm3/fs/fscache/cookie.c	2005-05-07 11:01:39.000000000 -0400
@@ -452,7 +452,11 @@ static int fscache_search_for_object(str
 		cache->ops->lock_node(node);
 
 		/* a node should only ever be attached to one cookie */
-		BUG_ON(!list_empty(&node->cookie_link));
+		if (!list_empty(&node->cookie_link)) {
+			cache->ops->unlock_node(node);
+			ret = -EEXIST;
+			goto error;
+		}
 
 		/* attach the node to the cache's node list */
 		if (list_empty(&node->cache_link)) {

[-- Attachment #4: 2.6.12-rc6-mm1-cachefs-wb.patch --]
[-- Type: text/x-patch, Size: 594 bytes --]

This fixes a BUG() poping at mm/filemap.c:465 pops when reading a 
100m file using nfs4. 

Signed-off-by: Steve Dickson <steved@redhat.com>


--- 2.6.12-rc2-mm3/fs/cachefs/journal.c.save	2005-04-27 08:06:03.000000000 -0400
+++ 2.6.12-rc2-mm3/fs/cachefs/journal.c	2005-05-03 11:11:17.000000000 -0400
@@ -682,6 +682,7 @@ static inline void cachefs_trans_batch_p
 		list_add_tail(&block->batch_link, plist);
 		block->writeback = block->page;
 		get_page(block->writeback);
+		SetPageWriteback(block->writeback);
 
 		/* make sure DMA can reach the data */
 		flush_dcache_page(block->writeback);

[-- Attachment #5: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-06-13 12:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-09 10:31 NFS Patch for FSCache Steve Dickson
2005-05-09 21:19 ` Andrew Morton
2005-05-10 18:43   ` Steve Dickson
2005-05-10 19:12   ` [Linux-cachefs] " David Howells
2005-05-14  2:18     ` Troy Benjegerdes
2005-05-16 13:30     ` David Howells
2005-06-13 12:52 ` Steve Dickson
  -- strict thread matches above, loose matches on Subject: below --
2005-05-17 21:42 David Masover
2005-05-14  2:08 ` Troy Benjegerdes
2005-05-12 22:43   ` [Linux-cachefs] " Lever, Charles
2005-05-13 11:17     ` David Howells
2005-05-16 12:47       ` [Linux-cachefs] " David Howells
2005-05-18 10:28         ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).