[RFC] [PATCH v2 0/1] nfsd: Improve NFS server performance

Linux NFS development
 help / color / mirror / Atom feed

* [RFC] [PATCH v2 0/1] nfsd: Improve NFS server performance
@ 2009-01-24 12:34 Krishna Kumar
       [not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Krishna Kumar @ 2009-01-24 12:34 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

			Patch summary:
			--------------

Change the caching from ino/dev to a file handle model. Advantages:

1. Since file handles are unique, this patch removes all dependencies on the
   kernel readahead parameters and it's implementation; and instead caches
   files based on file handles.

2. Allows the server to not have to open/close a file multiple times when the
   client reads it, and results in faster lookup times.

3. Readahead is automatically taken care of since the file is not closed while
   it is getting read (quickly) by the client.

4. Another optimization is to avoid getting the cache lock twice for each read
   operation (after the first time it is taken to update the cache).

			List of changes from Ver1:
			--------------------------

1. Implement entire logic of either updating-cache or do-nothing or close-file
   in fh_cache_upd. This simplifies the caller (nfsd_read).

2. nfsd_get_fhcache doesn't overwrite existing entries which would require the
   existing entries to be freed up - that is done by the daemon exclusively.
   This saves time at lookup by avoiding freeing entries (fh_cache_put).
   Another change is to optimize the logic of selecting a free slot.

3. Due to #2, fh_cache_upd doesn't have to test whether the entry is already
   on the daemon list (it never is, plus list_del_init is changed to list_del).

4. As a result of #2, the daemon becomes simpler - there is no race to handle
   where there is an entry on the list but it has no cached file/dentry, etc.

5. Made some comments clearer and easier to understand.

			Performance:
			-------------

This patch was tested with clients running 2, 4, 8, 16 --- 256 test processes,
each doing reads of different files. Each test includes different I/O sizes.
31/77 tests got more than 5% improvement; with 5 tests getting more than 10%
gain (full results at the end of this post).

Please review. Any comments or improvement ideas are appreciated.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

		(#Test Processes on Client == #NFSD's on Server)
-----------------------------------------------------------------------
#Test Processes Bufsize		Org BW KB/s	New BW KB/s	%
-----------------------------------------------------------------------
2		256		68022.46	71094.64	4.51
2		4096		67833.74	70726.38	4.26
2		8192		64541.14	69635.93	7.89
2		16384		65708.86	68994.88	5.00
2		32768		64272.28	68525.36	6.61
2		65536		64684.13	69103.28	6.83
2		131072		64765.67	68855.57	6.31

4		256		60849.48	64702.04	6.33
4		4096		60660.67	64309.37	6.01
4		8192		60506.00	61142.84	1.05
4		16384		60796.86	64069.82	5.38
4		32768		60947.07	64648.57	6.07
4		65536		60774.45	63735.24	4.87
4		131072		61369.66	65261.85	6.34

8		256		49239.57	54467.33	10.61
8		4096		50650.45	55400.01	9.37
8		8192		50661.58	51732.16	2.11
8		16384		51114.76	56025.31	9.60
8		32768		52367.20	54348.05	3.78
8		65536		51000.23	54285.63	6.44
8		131072		52996.73	54021.11	1.93

16		256		44534.67	45478.60	2.11
16		4096		43897.69	46519.89	5.97
16		8192		43787.87	44083.61	.67
16		16384		43883.62	46726.03	6.47
16		32768		44284.96	44035.86	-.56
16		65536		43804.33	44865.20	2.42
16		131072		44525.30	43752.62	-1.73

32		256		40420.30	42575.30	5.33
32		4096		39913.14	42279.21	5.92
32		8192		40261.19	42399.93	5.31
32		16384		38094.95	42645.32	11.94
32		32768		40610.27	43015.37	5.92
32		65536		41438.05	41794.76	.86
32		131072		41869.06	43644.07	4.23

48		256		36986.45	40185.34	8.64
48		4096		36585.79	38227.38	4.48
48		8192		38406.78	38055.91	-.91
48		16384		34950.05	36688.86	4.97
48		32768		38908.71	37900.33	-2.59
48		65536		39364.64	40036.67	1.70
48		131072		40391.56	40887.11	1.22

64		256		32821.89	34568.06	5.32
64		4096		35468.42	35529.29	.17
64		8192		34135.44	36462.31	6.81
64		16384		31926.51	32694.91	2.40
64		32768		35527.69	35234.60	-.82
64		65536		36066.08	36359.77	.81
64		131072		35969.04	37462.86	4.15

96		256		30032.74	29973.45	-.19
96		4096		29687.06	30881.64	4.02
96		8192		31142.51	32504.66	4.37
96		16384		29546.77	30663.39	3.77
96		32768		32458.94	32797.36	1.04
96		65536		32826.99	33451.66	1.90
96		131072		33601.46	34177.39	1.71

128		256		28584.59	29092.11	1.77
128		4096		29311.11	30097.65	2.68
128		8192		31398.87	33154.63	5.59
128		16384		28283.58	29071.45	2.78
128		32768		32819.93	33654.11	2.54
128		65536		32617.13	33778.46	3.56
128		131072		32972.71	34160.82	3.60

192		256		25245.92	26331.48	4.29
192		4096		27368.48	29318.49	7.12
192		8192		30173.74	31477.35	4.32
192		16384		26388.54	29719.15	12.62
192		32768		31840.91	33606.17	5.54
192		65536		33374.85	33607.14	.69
192		131072		33523.48	32601.93	-2.74

256		256		22256.91	21139.79	-5.01
256		4096		25192.75	24281.51	-3.61
256		8192		26534.95	28100.59	5.90
256		16384		24162.85	25607.86	5.98
256		32768		29218.38	29417.28	.68
256		65536		29609.59	30049.79	1.48
256		131072		30014.29	30132.33	.39
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]

* [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle
       [not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-01-24 12:35   ` Krishna Kumar
       [not found]     ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Krishna Kumar @ 2009-01-24 12:35 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement the file handle caching. List of changes:

	1. Remove all implementation and users of readahead, rename RA to FH,
           parm to cache.
	2. Add fields in the fhparms to cache file, svc_export, expiry list
	   and expiry time. Modify some other fields (eg p_count is atomic).
	3. Implement a daemon to clean up cached FH's.
	4. Added four helper functions:
		fh_cache_get: Hold a reference to dentry and svc_export.
		fh_cache_put: Drop a reference to file, dentry and svc_export.
		fh_get_cached_entries: Returns cached file and svc_export.
		fh_cache_upd: Cache file and svc_export. Add entry to list for
			      daemon to cleanup.
	5. nfsd_get_raparms is slightly rewritten (changed to nfsd_get_fhcache).
	6. nfsd_read rewritten to use the cache, remove RA from nfsd_vfs_read.
	7. File remove operation from the client results in the server checking
	   the cache and drops reference immediately (remove operation on the
	   server still retains the reference for a short time before the
	   daemon frees up the entry).
	8. init and shutdown are slightly modified.

	(ra_size, ra_depth, nfsd_racache_init and nfsd_racache_shutdown still
	   retain the "ra" prefix for now)


Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 fs/nfsd/vfs.c |  429 +++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 338 insertions(+), 91 deletions(-)

diff -ruNp 2.6.29-rc2.org/fs/nfsd/vfs.c 2.6.29-rc2.new/fs/nfsd/vfs.c
--- 2.6.29-rc2.org/fs/nfsd/vfs.c	2009-01-22 13:23:18.000000000 +0530
+++ 2.6.29-rc2.new/fs/nfsd/vfs.c	2009-01-22 13:23:18.000000000 +0530
@@ -55,38 +55,53 @@
 #include <linux/security.h>
 #endif /* CONFIG_NFSD_V4 */
 #include <linux/jhash.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
+/* Number of jiffies to cache the file before releasing */
+#define NFSD_CACHE_JIFFIES		100
 
 /*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
+ * This is a cache of file handles to quicken file lookup. This also
+ * helps prevent multiple open/close of a file when a client reads it.
+ *
  * If you increase the number of cached files very much, you'll need to
  * add a hash table here.
  */
-struct raparms {
-	struct raparms		*p_next;
-	unsigned int		p_count;
-	ino_t			p_ino;
-	dev_t			p_dev;
-	int			p_set;
-	struct file_ra_state	p_ra;
+struct fhcache {
+	struct fhcache		*p_next;
+
+	/* Hashed on this parameter */
+	__u32			p_auth;
+
+	/* Cached information */
+	struct file		*p_filp;
+	struct svc_export	*p_exp;
+
+	/* Refcount for overwrite */
+	atomic_t		p_count;
+
+	/* When this entry expires */
+	unsigned long		p_expires;
+
+	/* List of entries linked to 'nfsd_daemon_list' */
+	struct list_head	p_list;
+
 	unsigned int		p_hindex;
 };
 
-struct raparm_hbucket {
-	struct raparms		*pb_head;
+struct fhcache_hbucket {
+	struct fhcache		*pb_head;
 	spinlock_t		pb_lock;
 } ____cacheline_aligned_in_smp;
 
-#define RAPARM_HASH_BITS	4
-#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
+#define FHPARM_HASH_BITS	8
+#define FHPARM_HASH_SIZE	(1<<FHPARM_HASH_BITS)
+#define FHPARM_HASH_MASK	(FHPARM_HASH_SIZE-1)
+static struct fhcache_hbucket	fhcache_hash[FHPARM_HASH_SIZE];
 
 /* 
  * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
@@ -784,51 +799,223 @@ nfsd_sync_dir(struct dentry *dp)
 	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
 }
 
+/* Daemon to handle expired fh cache entries */
+static struct task_struct	*k_nfsd_task;
+
+/* Synchronization for daemon with enqueuer's */
+static DEFINE_SPINLOCK(k_nfsd_lock);
+
+/* List of FH cache entries that has to be cleaned up when they expire */
+static struct list_head		nfsd_daemon_list;
+
+/*
+ * Returns cached values of 'file' and svc_export; resets these entries
+ * to NULL.
+ */
+static inline void fh_get_cached_entries(struct fhcache *fh,
+					 struct file **filep,
+					 struct svc_export **expp)
+{
+	*filep = fh->p_filp;
+	*expp = fh->p_exp;
+
+	fh->p_filp = NULL;
+	fh->p_exp = NULL;
+}
+
+/*
+ * Hold a reference to dentry and svc_export (file already has an extra
+ * reference count as it is not closed normally.
+ */
+static inline void fh_cache_get(struct file *file, struct svc_export *exp)
+{
+	dget(file->f_dentry);
+	cache_get(&exp->h);
+}
+
+/* Drop a reference to file, dentry and svc_export */
+static inline void fh_cache_put(struct file *file, struct svc_export *exp)
+{
+	cache_put(&exp->h, &svc_export_cache);
+	dput(file->f_dentry);
+	fput(file);
+}
+
 /*
- * Obtain the readahead parameters for the file
- * specified by (dev, ino).
+ * Holds a reference to dentry and svc_export, and caches both. Add fh entry
+ * to list for daemon to cleanup later. Once we add the entry to the list,
+ * we'd rather it expire prematurely rather than updating it on every read.
  */
+static inline void fh_cache_upd(struct fhcache *fh, struct file *file,
+				struct svc_export *exp)
+{
+	if (fh) {
+		if (!fh->p_filp && file) {
+			struct fhcache_hbucket *fhb;
+
+			fh_cache_get(file, exp);
+
+			fhb = &fhcache_hash[fh->p_hindex];
+
+			spin_lock(&fhb->pb_lock);
+			fh->p_filp = file;
+			fh->p_exp = exp;
+			fh->p_expires = jiffies + NFSD_CACHE_JIFFIES;
 
-static inline struct raparms *
-nfsd_get_raparms(dev_t dev, ino_t ino)
+			spin_lock(&k_nfsd_lock);
+			list_add_tail(&fh->p_list, &nfsd_daemon_list);
+			spin_unlock(&k_nfsd_lock);
+
+			spin_unlock(&fhb->pb_lock);
+		}
+
+		/* Drop our reference */
+		atomic_dec(&fh->p_count);
+	} else if (file)
+		nfsd_close(file);
+}
+
+/* Daemon cache cleanup handler */
+void daemon_free_entries(void)
 {
-	struct raparms	*ra, **rap, **frap = NULL;
-	int depth = 0;
-	unsigned int hash;
-	struct raparm_hbucket *rab;
+	unsigned long now = jiffies;
+
+	spin_lock(&k_nfsd_lock);
+	while (!list_empty(&nfsd_daemon_list)) {
+		struct fhcache *fh = list_entry(nfsd_daemon_list.next,
+                                                struct fhcache, p_list);
+		struct fhcache_hbucket *fhb;
+		struct file *file;
+		struct svc_export *exp;
+
+		if (time_after(fh->p_expires, now) || now != jiffies) {
+			/*
+			 * This (and all subsequent entries) have not expired;
+			 * or we have spent too long in this loop.
+			 */
+			break;
+		}
+
+		fhb = &fhcache_hash[fh->p_hindex];
+
+		/*
+		 * Make sure we do not deadlock with updaters - we can free
+		 * entry next time in case of a race.
+		 */
+		if (!spin_trylock(&fhb->pb_lock)) {
+			/*
+			 * Entry is being used, no need to free this, try later
+			 */
+			break;
+		}
+
+		if (atomic_read(&fh->p_count)) {
+			spin_unlock(&fhb->pb_lock);
+			break;
+		}
 
-	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
-	rab = &raparm_hash[hash];
+		list_del(&fh->p_list);
+		fh_get_cached_entries(fh, &file, &exp);
+		spin_unlock(&fhb->pb_lock);
+		spin_unlock(&k_nfsd_lock);
+
+		fh_cache_put(file, exp);
+		spin_lock(&k_nfsd_lock);
+	}
+	spin_unlock(&k_nfsd_lock);
+}
+
+static int k_nfsd_thread(void *unused)
+{
+	while (!kthread_should_stop()) {
+		schedule_timeout_interruptible(NFSD_CACHE_JIFFIES);
 
-	spin_lock(&rab->pb_lock);
-	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
-		if (ra->p_ino == ino && ra->p_dev == dev)
+		if (kthread_should_stop())
+			break;
+
+		daemon_free_entries();
+	}
+	__set_current_state(TASK_RUNNING);
+
+	return 0;
+}
+
+/*
+ * Obtain the cached file, export and d_inode values for the FH
+ * specified by fh->auth[3]
+ */
+static inline struct fhcache *
+nfsd_get_fhcache(__u32 auth)
+{
+	struct fhcache		*fh, **fhp, **ffhp = NULL;
+	int			depth = 0;
+	unsigned int		hash;
+	struct fhcache_hbucket	*fhb;
+
+	if (!auth)
+		return NULL;
+
+	hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK;
+	fhb = &fhcache_hash[hash];
+
+	spin_lock(&fhb->pb_lock);
+	for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) {
+		if (fh->p_auth == auth) {
+			/* Same inode */
+			if (unlikely(!fh->p_filp)) {
+				/*
+				 * Someone is racing in the same code, and
+				 * this is the first reference to this file.
+				 */
+				spin_unlock(&fhb->pb_lock);
+				return NULL;
+			}
+
+			/*
+			 * Hold an extra reference to dentry/exp since these
+			 * are released in fh_put(). 'file' already has an
+			 * extra hold from the first lookup which was never
+			 * dropped.
+			 */
+			fh_cache_get(fh->p_filp, fh->p_exp);
 			goto found;
+		}
+
 		depth++;
-		if (ra->p_count == 0)
-			frap = rap;
+
+		if (!ffhp && !fh->p_filp) {
+			/*
+			 * This is an unused inode (or a different one), and no
+			 * entry was found till now.
+			 */
+			if (!atomic_read(&fh->p_count)) /* Entry is unused */
+				ffhp = fhp;
+		}
 	}
-	depth = nfsdstats.ra_size*11/10;
-	if (!frap) {	
-		spin_unlock(&rab->pb_lock);
+
+	if (!ffhp) {
+		spin_unlock(&fhb->pb_lock);
 		return NULL;
 	}
-	rap = frap;
-	ra = *frap;
-	ra->p_dev = dev;
-	ra->p_ino = ino;
-	ra->p_set = 0;
-	ra->p_hindex = hash;
+
+	depth = nfsdstats.ra_size*11/10;
+	fhp = ffhp;
+	fh = *ffhp;
+	fh->p_hindex = hash;
+	fh->p_auth = auth;
+
 found:
-	if (rap != &rab->pb_head) {
-		*rap = ra->p_next;
-		ra->p_next   = rab->pb_head;
-		rab->pb_head = ra;
+	if (fhp != &fhb->pb_head) {
+		*fhp = fh->p_next;
+		fh->p_next   = fhb->pb_head;
+		fhb->pb_head = fh;
 	}
-	ra->p_count++;
+
+	atomic_inc(&fh->p_count);
 	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
-	spin_unlock(&rab->pb_lock);
-	return ra;
+	spin_unlock(&fhb->pb_lock);
+
+	return fh;
 }
 
 /*
@@ -892,7 +1079,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
               loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
 {
 	struct inode *inode;
-	struct raparms	*ra;
 	mm_segment_t	oldfs;
 	__be32		err;
 	int		host_err;
@@ -903,11 +1089,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 	if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count))
 		goto out;
 
-	/* Get readahead parameters */
-	ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
-
-	if (ra && ra->p_set)
-		file->f_ra = ra->p_ra;
 
 	if (file->f_op->splice_read && rqstp->rq_splice_ok) {
 		struct splice_desc sd = {
@@ -926,16 +1107,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 		set_fs(oldfs);
 	}
 
-	/* Write back readahead params */
-	if (ra) {
-		struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
-		spin_lock(&rab->pb_lock);
-		ra->p_ra = file->f_ra;
-		ra->p_set = 1;
-		ra->p_count--;
-		spin_unlock(&rab->pb_lock);
-	}
-
 	if (host_err >= 0) {
 		nfsdstats.io_read += host_err;
 		*count = host_err;
@@ -1078,12 +1249,30 @@ nfsd_read(struct svc_rqst *rqstp, struct
 			goto out;
 		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
 	} else {
-		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
-		if (err)
-			goto out;
-		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
-		nfsd_close(file);
+		struct fhcache	*fh;
+
+		/* Check if this fh is cached */
+		fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]);
+		if (fh && fh->p_filp) {
+			/* Got cached values */
+			file = fh->p_filp;
+			fhp->fh_dentry = file->f_dentry;
+			fhp->fh_export = fh->p_exp;
+			err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ);
+		} else {
+			/* Nothing in cache */
+			err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,
+					&file);
+		}
+
+		if (!err)
+			err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen,
+					    count);
+
+		/* Update cached values if required, and clean up */
+		fh_cache_upd(fh, file, fhp->fh_export);
 	}
+
 out:
 	return err;
 }
@@ -1791,6 +1980,39 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 		goto out_nfserr;
 
 	if (type != S_IFDIR) { /* It's UNLINK */
+		int i, found = 0;
+
+		for (i = 0 ; i < FHPARM_HASH_SIZE && !found; i++) {
+			struct fhcache_hbucket *fhb = &fhcache_hash[i];
+			struct fhcache *fh;
+
+			spin_lock(&fhb->pb_lock);
+			for (fh = fhb->pb_head; fh; fh = fh->p_next) {
+				if (fh->p_filp &&
+				    fh->p_filp->f_dentry == rdentry) {
+					/* Found the entry for removed file */
+					struct file *file;
+					struct svc_export *exp;
+
+					fh_get_cached_entries(fh, &file, &exp);
+
+					spin_lock(&k_nfsd_lock);
+					list_del(&fh->p_list);
+					spin_unlock(&k_nfsd_lock);
+
+					spin_unlock(&fhb->pb_lock);
+
+					/* Drop reference to this entry */
+					fh_cache_put(file, exp);
+
+					spin_lock(&fhb->pb_lock);
+					found = 1;
+					break;
+				}
+			}
+			spin_unlock(&fhb->pb_lock);
+		}
+
 #ifdef MSNFS
 		if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
 			(atomic_read(&rdentry->d_count) > 1)) {
@@ -2061,23 +2283,36 @@ nfsd_permission(struct svc_rqst *rqstp, 
 void
 nfsd_racache_shutdown(void)
 {
-	struct raparms *raparm, *last_raparm;
 	unsigned int i;
 
-	dprintk("nfsd: freeing readahead buffers.\n");
+	dprintk("nfsd: freeing FH buffers.\n");
 
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		raparm = raparm_hash[i].pb_head;
-		while(raparm) {
-			last_raparm = raparm;
-			raparm = raparm->p_next;
-			kfree(last_raparm);
+	/* Stop the daemon and free the list entries */
+        kthread_stop(k_nfsd_task);
+        k_nfsd_task = NULL;
+
+	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
+		struct fhcache *fhcache, *last_fhcache;
+
+		fhcache = fhcache_hash[i].pb_head;
+		while(fhcache) {
+			last_fhcache = fhcache;
+			if (fhcache->p_filp) {
+				struct file *file;
+				struct svc_export *exp;
+
+				fh_get_cached_entries(fhcache, &file, &exp);
+				list_del(&fhcache->p_list);
+				fh_cache_put(file, exp);
+			}
+			fhcache = fhcache->p_next;
+			kfree(last_fhcache);
 		}
-		raparm_hash[i].pb_head = NULL;
+		fhcache_hash[i].pb_head = NULL;
 	}
 }
 /*
- * Initialize readahead param cache
+ * Initialize file handle cache
  */
 int
 nfsd_racache_init(int cache_size)
@@ -2085,36 +2320,48 @@ nfsd_racache_init(int cache_size)
 	int	i;
 	int	j = 0;
 	int	nperbucket;
-	struct raparms **raparm = NULL;
+	struct fhcache **fhcache = NULL;
 
 
-	if (raparm_hash[0].pb_head)
+	if (fhcache_hash[0].pb_head)
 		return 0;
-	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
+	nperbucket = DIV_ROUND_UP(cache_size, FHPARM_HASH_SIZE);
 	if (nperbucket < 2)
 		nperbucket = 2;
-	cache_size = nperbucket * RAPARM_HASH_SIZE;
+	cache_size = nperbucket * FHPARM_HASH_SIZE;
 
-	dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
+	dprintk("nfsd: allocating %d file handle cache buffers.\n", cache_size);
 
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		spin_lock_init(&raparm_hash[i].pb_lock);
+	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
+		spin_lock_init(&fhcache_hash[i].pb_lock);
 
-		raparm = &raparm_hash[i].pb_head;
+		fhcache = &fhcache_hash[i].pb_head;
 		for (j = 0; j < nperbucket; j++) {
-			*raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
-			if (!*raparm)
+			*fhcache = kzalloc(sizeof(struct fhcache), GFP_KERNEL);
+			if (!*fhcache) {
+				dprintk("nfsd: kmalloc failed, freeing file cache buffers\n");
 				goto out_nomem;
-			raparm = &(*raparm)->p_next;
+			}
+			INIT_LIST_HEAD(&(*fhcache)->p_list);
+			fhcache = &(*fhcache)->p_next;
 		}
-		*raparm = NULL;
+		*fhcache = NULL;
 	}
 
 	nfsdstats.ra_size = cache_size;
+
+	INIT_LIST_HEAD(&nfsd_daemon_list);
+	k_nfsd_task = kthread_run(k_nfsd_thread, NULL, "nfsd_cacher");
+
+	if (IS_ERR(k_nfsd_task)) {
+		printk(KERN_ERR "%s: unable to create kernel thread: %ld\n",
+		       __FUNCTION__, PTR_ERR(k_nfsd_task));
+		goto out_nomem;
+	}
+
 	return 0;
 
 out_nomem:
-	dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
 	nfsd_racache_shutdown();
 	return -ENOMEM;
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]

* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle
       [not found]     ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-02-20 23:49       ` Jeff Layton
       [not found]         ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2009-02-20 23:49 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: bfields, linux-nfs

On Sat, 24 Jan 2009 18:05:11 +0530
Krishna Kumar <krkumar2@in.ibm.com> wrote:

> From: Krishna Kumar <krkumar2@in.ibm.com>
> 
> Implement the file handle caching. List of changes:
> 
> 	1. Remove all implementation and users of readahead, rename RA to FH,
>            parm to cache.
> 	2. Add fields in the fhparms to cache file, svc_export, expiry list
> 	   and expiry time. Modify some other fields (eg p_count is atomic).
> 	3. Implement a daemon to clean up cached FH's.
> 	4. Added four helper functions:
> 		fh_cache_get: Hold a reference to dentry and svc_export.
> 		fh_cache_put: Drop a reference to file, dentry and svc_export.
> 		fh_get_cached_entries: Returns cached file and svc_export.
> 		fh_cache_upd: Cache file and svc_export. Add entry to list for
> 			      daemon to cleanup.
> 	5. nfsd_get_raparms is slightly rewritten (changed to nfsd_get_fhcache).
> 	6. nfsd_read rewritten to use the cache, remove RA from nfsd_vfs_read.
> 	7. File remove operation from the client results in the server checking
> 	   the cache and drops reference immediately (remove operation on the
> 	   server still retains the reference for a short time before the
> 	   daemon frees up the entry).
> 	8. init and shutdown are slightly modified.
> 
> 	(ra_size, ra_depth, nfsd_racache_init and nfsd_racache_shutdown still
> 	   retain the "ra" prefix for now)
> 
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  fs/nfsd/vfs.c |  429 +++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 338 insertions(+), 91 deletions(-)
> 

Hi Krishna. Sorry for the slow response on my part as well. First, some
general comments:

It would be nice to break up this patchset some. It's very hard to
review large-scale changes like this, and this is quite frankly not
code that is often touched by human hands. Being able to do a git
bisect if we run across bugs after these changes would be a very nice
thing. It won't be helpful though unless the changes are in smaller
pieces.

As a general guideline you'll want to try to do the changes in smaller
pieces, but each patch should leave the tree in a working state. It's
hard to be more specific than that without delving more deeply (which I
don't have time to do at the moment).

In your earlier discussion with Bruce, you mentioned trying to
determine when to flush the cache. When the exports table is changed
via exportfs, the exports kernel cache is also flushed. Hooking into
that might be the best thing...

I'd also go ahead and get rid of the ra_ prefixes unless you feel
they're needed. It'd be best to clean this up so that we don't have to
muck around in here later.

More comments below:

> diff -ruNp 2.6.29-rc2.org/fs/nfsd/vfs.c 2.6.29-rc2.new/fs/nfsd/vfs.c
> --- 2.6.29-rc2.org/fs/nfsd/vfs.c	2009-01-22 13:23:18.000000000 +0530
> +++ 2.6.29-rc2.new/fs/nfsd/vfs.c	2009-01-22 13:23:18.000000000 +0530
> @@ -55,38 +55,53 @@
>  #include <linux/security.h>
>  #endif /* CONFIG_NFSD_V4 */
>  #include <linux/jhash.h>
> +#include <linux/kthread.h>
>  
>  #include <asm/uaccess.h>
>  
>  #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
>  
> +/* Number of jiffies to cache the file before releasing */
> +#define NFSD_CACHE_JIFFIES		100
>  

^^^^
It'd probably be better to express this in terms of HZ so that this is
a fixed amount of time regardless of how the kernel is compiled.

>  /*
> - * This is a cache of readahead params that help us choose the proper
> - * readahead strategy. Initially, we set all readahead parameters to 0
> - * and let the VFS handle things.
> + * This is a cache of file handles to quicken file lookup. This also
> + * helps prevent multiple open/close of a file when a client reads it.
> + *
>   * If you increase the number of cached files very much, you'll need to
>   * add a hash table here.
>   */
> -struct raparms {
> -	struct raparms		*p_next;
> -	unsigned int		p_count;
> -	ino_t			p_ino;
> -	dev_t			p_dev;
> -	int			p_set;
> -	struct file_ra_state	p_ra;
> +struct fhcache {
> +	struct fhcache		*p_next;
> +
> +	/* Hashed on this parameter */
> +	__u32			p_auth;
> +
> +	/* Cached information */
> +	struct file		*p_filp;
> +	struct svc_export	*p_exp;
> +
> +	/* Refcount for overwrite */
> +	atomic_t		p_count;
> +
> +	/* When this entry expires */
> +	unsigned long		p_expires;
> +
> +	/* List of entries linked to 'nfsd_daemon_list' */
> +	struct list_head	p_list;
> +
>  	unsigned int		p_hindex;
>  };
>  
> -struct raparm_hbucket {
> -	struct raparms		*pb_head;
> +struct fhcache_hbucket {
> +	struct fhcache		*pb_head;
>  	spinlock_t		pb_lock;
>  } ____cacheline_aligned_in_smp;
>  
> -#define RAPARM_HASH_BITS	4
> -#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
> -#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
> -static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
> +#define FHPARM_HASH_BITS	8
> +#define FHPARM_HASH_SIZE	(1<<FHPARM_HASH_BITS)
> +#define FHPARM_HASH_MASK	(FHPARM_HASH_SIZE-1)
> +static struct fhcache_hbucket	fhcache_hash[FHPARM_HASH_SIZE];
>  
>  /* 
>   * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
> @@ -784,51 +799,223 @@ nfsd_sync_dir(struct dentry *dp)
>  	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
>  }
>  
> +/* Daemon to handle expired fh cache entries */
> +static struct task_struct	*k_nfsd_task;
> +
> +/* Synchronization for daemon with enqueuer's */
> +static DEFINE_SPINLOCK(k_nfsd_lock);
> +
> +/* List of FH cache entries that has to be cleaned up when they expire */
> +static struct list_head		nfsd_daemon_list;
> +

^^^ some more descriptive variable names would be welcome...

> +/*
> + * Returns cached values of 'file' and svc_export; resets these entries
> + * to NULL.
> + */
> +static inline void fh_get_cached_entries(struct fhcache *fh,
> +					 struct file **filep,
> +					 struct svc_export **expp)
> +{
> +	*filep = fh->p_filp;
> +	*expp = fh->p_exp;
> +
> +	fh->p_filp = NULL;
> +	fh->p_exp = NULL;
> +}
> +
> +/*
> + * Hold a reference to dentry and svc_export (file already has an extra
> + * reference count as it is not closed normally.
> + */
> +static inline void fh_cache_get(struct file *file, struct svc_export *exp)
> +{
> +	dget(file->f_dentry);
> +	cache_get(&exp->h);
> +}
> +
> +/* Drop a reference to file, dentry and svc_export */
> +static inline void fh_cache_put(struct file *file, struct svc_export *exp)
> +{
> +	cache_put(&exp->h, &svc_export_cache);
> +	dput(file->f_dentry);
> +	fput(file);
> +}
> +
>  /*
> - * Obtain the readahead parameters for the file
> - * specified by (dev, ino).
> + * Holds a reference to dentry and svc_export, and caches both. Add fh entry
> + * to list for daemon to cleanup later. Once we add the entry to the list,
> + * we'd rather it expire prematurely rather than updating it on every read.
>   */
> +static inline void fh_cache_upd(struct fhcache *fh, struct file *file,
> +				struct svc_export *exp)
> +{
> +	if (fh) {
> +		if (!fh->p_filp && file) {
> +			struct fhcache_hbucket *fhb;
> +
> +			fh_cache_get(file, exp);
> +
> +			fhb = &fhcache_hash[fh->p_hindex];
> +
> +			spin_lock(&fhb->pb_lock);
> +			fh->p_filp = file;
> +			fh->p_exp = exp;
> +			fh->p_expires = jiffies + NFSD_CACHE_JIFFIES;
>  
> -static inline struct raparms *
> -nfsd_get_raparms(dev_t dev, ino_t ino)
> +			spin_lock(&k_nfsd_lock);
> +			list_add_tail(&fh->p_list, &nfsd_daemon_list);
> +			spin_unlock(&k_nfsd_lock);
> +
> +			spin_unlock(&fhb->pb_lock);
> +		}
> +
> +		/* Drop our reference */
> +		atomic_dec(&fh->p_count);
> +	} else if (file)
> +		nfsd_close(file);
> +}
> +
> +/* Daemon cache cleanup handler */
> +void daemon_free_entries(void)
>  {
> -	struct raparms	*ra, **rap, **frap = NULL;
> -	int depth = 0;
> -	unsigned int hash;
> -	struct raparm_hbucket *rab;
> +	unsigned long now = jiffies;
> +
> +	spin_lock(&k_nfsd_lock);
> +	while (!list_empty(&nfsd_daemon_list)) {
> +		struct fhcache *fh = list_entry(nfsd_daemon_list.next,
> +                                                struct fhcache, p_list);
> +		struct fhcache_hbucket *fhb;
> +		struct file *file;
> +		struct svc_export *exp;
> +
> +		if (time_after(fh->p_expires, now) || now != jiffies) {
> +			/*
> +			 * This (and all subsequent entries) have not expired;
> +			 * or we have spent too long in this loop.
> +			 */
> +			break;
> +		}
> +
> +		fhb = &fhcache_hash[fh->p_hindex];
> +
> +		/*
> +		 * Make sure we do not deadlock with updaters - we can free
> +		 * entry next time in case of a race.
> +		 */
> +		if (!spin_trylock(&fhb->pb_lock)) {
> +			/*
> +			 * Entry is being used, no need to free this, try later
> +			 */
> +			break;
> +		}
> +
> +		if (atomic_read(&fh->p_count)) {
> +			spin_unlock(&fhb->pb_lock);
> +			break;
> +		}
>  
> -	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
> -	rab = &raparm_hash[hash];
> +		list_del(&fh->p_list);
> +		fh_get_cached_entries(fh, &file, &exp);
> +		spin_unlock(&fhb->pb_lock);
> +		spin_unlock(&k_nfsd_lock);
> +
> +		fh_cache_put(file, exp);
> +		spin_lock(&k_nfsd_lock);
> +	}
> +	spin_unlock(&k_nfsd_lock);
> +}
> +
> +static int k_nfsd_thread(void *unused)
> +{
> +	while (!kthread_should_stop()) {
> +		schedule_timeout_interruptible(NFSD_CACHE_JIFFIES);
>  
> -	spin_lock(&rab->pb_lock);
> -	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
> -		if (ra->p_ino == ino && ra->p_dev == dev)
> +		if (kthread_should_stop())
> +			break;
> +
> +		daemon_free_entries();
> +	}
> +	__set_current_state(TASK_RUNNING);
> +
> +	return 0;
> +}
> +

^^^
...I can't say I'm thrilled about adding a kthread for this but don't
have any specific objections. I wonder if it might be better to just
periodically schedule (and reschedule) delayed work to the events queue
whenever a cache entry is touched?

> +/*
> + * Obtain the cached file, export and d_inode values for the FH
> + * specified by fh->auth[3]
> + */
> +static inline struct fhcache *
> +nfsd_get_fhcache(__u32 auth)
> +{
> +	struct fhcache		*fh, **fhp, **ffhp = NULL;
> +	int			depth = 0;
> +	unsigned int		hash;
> +	struct fhcache_hbucket	*fhb;
> +
> +	if (!auth)
> +		return NULL;
> +
> +	hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK;
> +	fhb = &fhcache_hash[hash];
> +
> +	spin_lock(&fhb->pb_lock);
> +	for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) {
> +		if (fh->p_auth == auth) {
> +			/* Same inode */
> +			if (unlikely(!fh->p_filp)) {
> +				/*
> +				 * Someone is racing in the same code, and
> +				 * this is the first reference to this file.
> +				 */
> +				spin_unlock(&fhb->pb_lock);
> +				return NULL;
> +			}
> +
> +			/*
> +			 * Hold an extra reference to dentry/exp since these
> +			 * are released in fh_put(). 'file' already has an
> +			 * extra hold from the first lookup which was never
> +			 * dropped.
> +			 */
> +			fh_cache_get(fh->p_filp, fh->p_exp);
>  			goto found;
> +		}
> +
>  		depth++;
> -		if (ra->p_count == 0)
> -			frap = rap;
> +
> +		if (!ffhp && !fh->p_filp) {
> +			/*
> +			 * This is an unused inode (or a different one), and no
> +			 * entry was found till now.
> +			 */
> +			if (!atomic_read(&fh->p_count)) /* Entry is unused */
> +				ffhp = fhp;
> +		}
>  	}
> -	depth = nfsdstats.ra_size*11/10;
> -	if (!frap) {	
> -		spin_unlock(&rab->pb_lock);
> +
> +	if (!ffhp) {
> +		spin_unlock(&fhb->pb_lock);
>  		return NULL;
>  	}
> -	rap = frap;
> -	ra = *frap;
> -	ra->p_dev = dev;
> -	ra->p_ino = ino;
> -	ra->p_set = 0;
> -	ra->p_hindex = hash;
> +
> +	depth = nfsdstats.ra_size*11/10;
> +	fhp = ffhp;
> +	fh = *ffhp;
> +	fh->p_hindex = hash;
> +	fh->p_auth = auth;
> +
>  found:
> -	if (rap != &rab->pb_head) {
> -		*rap = ra->p_next;
> -		ra->p_next   = rab->pb_head;
> -		rab->pb_head = ra;
> +	if (fhp != &fhb->pb_head) {
> +		*fhp = fh->p_next;
> +		fh->p_next   = fhb->pb_head;
> +		fhb->pb_head = fh;
>  	}
> -	ra->p_count++;
> +
> +	atomic_inc(&fh->p_count);
>  	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
> -	spin_unlock(&rab->pb_lock);
> -	return ra;
> +	spin_unlock(&fhb->pb_lock);
> +
> +	return fh;
>  }
>  
>  /*
> @@ -892,7 +1079,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
>                loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
>  {
>  	struct inode *inode;
> -	struct raparms	*ra;
>  	mm_segment_t	oldfs;
>  	__be32		err;
>  	int		host_err;
> @@ -903,11 +1089,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
>  	if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count))
>  		goto out;
>  
> -	/* Get readahead parameters */
> -	ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
> -
> -	if (ra && ra->p_set)
> -		file->f_ra = ra->p_ra;
>  
>  	if (file->f_op->splice_read && rqstp->rq_splice_ok) {
>  		struct splice_desc sd = {
> @@ -926,16 +1107,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
>  		set_fs(oldfs);
>  	}
>  
> -	/* Write back readahead params */
> -	if (ra) {
> -		struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
> -		spin_lock(&rab->pb_lock);
> -		ra->p_ra = file->f_ra;
> -		ra->p_set = 1;
> -		ra->p_count--;
> -		spin_unlock(&rab->pb_lock);
> -	}
> -
>  	if (host_err >= 0) {
>  		nfsdstats.io_read += host_err;
>  		*count = host_err;
> @@ -1078,12 +1249,30 @@ nfsd_read(struct svc_rqst *rqstp, struct
>  			goto out;
>  		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
>  	} else {
> -		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
> -		if (err)
> -			goto out;
> -		err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
> -		nfsd_close(file);
> +		struct fhcache	*fh;
> +
> +		/* Check if this fh is cached */
> +		fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]);
> +		if (fh && fh->p_filp) {
> +			/* Got cached values */
> +			file = fh->p_filp;
> +			fhp->fh_dentry = file->f_dentry;
> +			fhp->fh_export = fh->p_exp;
> +			err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ);
> +		} else {
> +			/* Nothing in cache */
> +			err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,
> +					&file);
> +		}
> +
> +		if (!err)
> +			err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen,
> +					    count);
> +
> +		/* Update cached values if required, and clean up */
> +		fh_cache_upd(fh, file, fhp->fh_export);
>  	}
> +
>  out:
>  	return err;
>  }
> @@ -1791,6 +1980,39 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
>  		goto out_nfserr;
>  
>  	if (type != S_IFDIR) { /* It's UNLINK */
> +		int i, found = 0;
> +
> +		for (i = 0 ; i < FHPARM_HASH_SIZE && !found; i++) {
> +			struct fhcache_hbucket *fhb = &fhcache_hash[i];
> +			struct fhcache *fh;
> +
> +			spin_lock(&fhb->pb_lock);
> +			for (fh = fhb->pb_head; fh; fh = fh->p_next) {
> +				if (fh->p_filp &&
> +				    fh->p_filp->f_dentry == rdentry) {
> +					/* Found the entry for removed file */
> +					struct file *file;
> +					struct svc_export *exp;
> +
> +					fh_get_cached_entries(fh, &file, &exp);
> +
> +					spin_lock(&k_nfsd_lock);
> +					list_del(&fh->p_list);
> +					spin_unlock(&k_nfsd_lock);
> +
> +					spin_unlock(&fhb->pb_lock);
> +
> +					/* Drop reference to this entry */
> +					fh_cache_put(file, exp);
> +
> +					spin_lock(&fhb->pb_lock);
> +					found = 1;
> +					break;
> +				}
> +			}
> +			spin_unlock(&fhb->pb_lock);
> +		}
> +
>  #ifdef MSNFS
>  		if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
>  			(atomic_read(&rdentry->d_count) > 1)) {
> @@ -2061,23 +2283,36 @@ nfsd_permission(struct svc_rqst *rqstp, 
>  void
>  nfsd_racache_shutdown(void)
>  {
> -	struct raparms *raparm, *last_raparm;
>  	unsigned int i;
>  
> -	dprintk("nfsd: freeing readahead buffers.\n");
> +	dprintk("nfsd: freeing FH buffers.\n");
>  
> -	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
> -		raparm = raparm_hash[i].pb_head;
> -		while(raparm) {
> -			last_raparm = raparm;
> -			raparm = raparm->p_next;
> -			kfree(last_raparm);
> +	/* Stop the daemon and free the list entries */
> +        kthread_stop(k_nfsd_task);
> +        k_nfsd_task = NULL;
> +
> +	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
> +		struct fhcache *fhcache, *last_fhcache;
> +
> +		fhcache = fhcache_hash[i].pb_head;
> +		while(fhcache) {
> +			last_fhcache = fhcache;
> +			if (fhcache->p_filp) {
> +				struct file *file;
> +				struct svc_export *exp;
> +
> +				fh_get_cached_entries(fhcache, &file, &exp);
> +				list_del(&fhcache->p_list);
> +				fh_cache_put(file, exp);
> +			}
> +			fhcache = fhcache->p_next;
> +			kfree(last_fhcache);
>  		}
> -		raparm_hash[i].pb_head = NULL;
> +		fhcache_hash[i].pb_head = NULL;
>  	}
>  }
>  /*
> - * Initialize readahead param cache
> + * Initialize file handle cache
>   */
>  int
>  nfsd_racache_init(int cache_size)
> @@ -2085,36 +2320,48 @@ nfsd_racache_init(int cache_size)
>  	int	i;
>  	int	j = 0;
>  	int	nperbucket;
> -	struct raparms **raparm = NULL;
> +	struct fhcache **fhcache = NULL;
>  
>  
> -	if (raparm_hash[0].pb_head)
> +	if (fhcache_hash[0].pb_head)
>  		return 0;
> -	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
> +	nperbucket = DIV_ROUND_UP(cache_size, FHPARM_HASH_SIZE);
>  	if (nperbucket < 2)
>  		nperbucket = 2;
> -	cache_size = nperbucket * RAPARM_HASH_SIZE;
> +	cache_size = nperbucket * FHPARM_HASH_SIZE;
>  
> -	dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
> +	dprintk("nfsd: allocating %d file handle cache buffers.\n", cache_size);
>  
> -	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
> -		spin_lock_init(&raparm_hash[i].pb_lock);
> +	for (i = 0; i < FHPARM_HASH_SIZE; i++) {
> +		spin_lock_init(&fhcache_hash[i].pb_lock);
>  
> -		raparm = &raparm_hash[i].pb_head;
> +		fhcache = &fhcache_hash[i].pb_head;
>  		for (j = 0; j < nperbucket; j++) {
> -			*raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
> -			if (!*raparm)
> +			*fhcache = kzalloc(sizeof(struct fhcache), GFP_KERNEL);
> +			if (!*fhcache) {
> +				dprintk("nfsd: kmalloc failed, freeing file cache buffers\n");
>  				goto out_nomem;
> -			raparm = &(*raparm)->p_next;
> +			}
> +			INIT_LIST_HEAD(&(*fhcache)->p_list);
> +			fhcache = &(*fhcache)->p_next;
>  		}
> -		*raparm = NULL;
> +		*fhcache = NULL;
>  	}
>  
>  	nfsdstats.ra_size = cache_size;
> +
> +	INIT_LIST_HEAD(&nfsd_daemon_list);
> +	k_nfsd_task = kthread_run(k_nfsd_thread, NULL, "nfsd_cacher");
> +
> +	if (IS_ERR(k_nfsd_task)) {
> +		printk(KERN_ERR "%s: unable to create kernel thread: %ld\n",
> +		       __FUNCTION__, PTR_ERR(k_nfsd_task));
> +		goto out_nomem;
> +	}
> +
>  	return 0;
>  
>  out_nomem:
> -	dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
>  	nfsd_racache_shutdown();
>  	return -ENOMEM;
>  }

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>]

* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle
       [not found]         ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>
@ 2009-02-21  7:27           ` Krishna Kumar2
  2009-02-21 16:35             ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Krishna Kumar2 @ 2009-02-21  7:27 UTC (permalink / raw)
  To: Jeff Layton; +Cc: bfields, linux-nfs

Hi Jeff,

Thanks for your review comments.

> It would be nice to break up this patchset some. It's very hard to
> review large-scale changes like this, and this is quite frankly not
> code that is often touched by human hands. Being able to do a git
> bisect if we run across bugs after these changes would be a very nice
> thing. It won't be helpful though unless the changes are in smaller
> pieces.

OK, I will try to break this up and send it again.

> In your earlier discussion with Bruce, you mentioned trying to
> determine when to flush the cache. When the exports table is changed
> via exportfs, the exports kernel cache is also flushed. Hooking into
> that might be the best thing...

Thanks for this suggestion - I will look into how to do this.

> I'd also go ahead and get rid of the ra_ prefixes unless you feel
> they're needed. It'd be best to clean this up so that we don't have to
> muck around in here later.

OK.

> > +/* Number of jiffies to cache the file before releasing */
> > +#define NFSD_CACHE_JIFFIES      100
>
> It'd probably be better to express this in terms of HZ so that this is
> a fixed amount of time regardless of how the kernel is compiled.

Definitely it should have been in terms of HZ.

> > +/* List of FH cache entries that has to be cleaned up when they expire
*/
> > +static struct list_head      nfsd_daemon_list;
> > +
>
> ^^^ some more descriptive variable names would be welcome...

OK.

> ...I can't say I'm thrilled about adding a kthread for this but don't
> have any specific objections. I wonder if it might be better to just
> periodically schedule (and reschedule) delayed work to the events queue
> whenever a cache entry is touched?

I was originally using queue_delayed_work but found the performance was
better when using a daemon (in that context, I had also submitted a patch
to lkml to implement a new API - queue_update_delayed_work - see:
http://lwn.net/Articles/300919/). I think the problem was due to heavy
contention with kernel timer locks, especially if there are a lot of
parallel reads at the same time (contention for the same timer "base" lock
with regular kernel timers used in other subsystems).

But I will run a test with delayed work and compare with a daemon and
report
what difference I find.

Bruce had also suggested doing a profile on new vs old kernel. Can someone
tell me what to run exactly (I got the oprofile compiled in, what user
commands
should I run and what should I look for/report)?

I would like to resubmit after getting Bruce's comments addressed, your
suggestion about workqueue vs daemon and the other changes suggested.

Thanks,

- KK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle
  2009-02-21  7:27           ` Krishna Kumar2
@ 2009-02-21 16:35             ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2009-02-21 16:35 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: bfields, linux-nfs

> > ...I can't say I'm thrilled about adding a kthread for this but don't
> > have any specific objections. I wonder if it might be better to just
> > periodically schedule (and reschedule) delayed work to the events queue
> > whenever a cache entry is touched?
> 
> I was originally using queue_delayed_work but found the performance was
> better when using a daemon (in that context, I had also submitted a patch
> to lkml to implement a new API - queue_update_delayed_work - see:
> http://lwn.net/Articles/300919/). I think the problem was due to heavy
> contention with kernel timer locks, especially if there are a lot of
> parallel reads at the same time (contention for the same timer "base" lock
> with regular kernel timers used in other subsystems).
> 
> But I will run a test with delayed work and compare with a daemon and
> report
> what difference I find.
> 
> Bruce had also suggested doing a profile on new vs old kernel. Can someone
> tell me what to run exactly (I got the oprofile compiled in, what user
> commands
> should I run and what should I look for/report)?
> 
> I would like to resubmit after getting Bruce's comments addressed, your
> suggestion about workqueue vs daemon and the other changes suggested.
> 

It seems strange that performance would suffer from using the generic
workqueue for that. The daemon in this patch is just there to clean up
the cache, correct?

If you have valid reasons for choosing a separate kthread to handle the
cleanup then I'm ok with that. It would be a good idea to document
the reasons for that design decision in the patch description so that
we're aware of them. Hard numbers would also help make the case either
way.

As far as oprofile...it's been a while since I've used it but when I do
I usually start with wcohen's page:

http://people.redhat.com/wcohen/

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-21 16:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-24 12:34 [RFC] [PATCH v2 0/1] nfsd: Improve NFS server performance Krishna Kumar
     [not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-01-24 12:35   ` [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle Krishna Kumar
     [not found]     ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-02-20 23:49       ` Jeff Layton
     [not found]         ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2009-02-21  7:27           ` Krishna Kumar2
2009-02-21 16:35             ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox