* [RFC] [PATCH v2 0/1] nfsd: Improve NFS server performance
@ 2009-01-24 12:34 Krishna Kumar
[not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Krishna Kumar @ 2009-01-24 12:34 UTC (permalink / raw)
To: bfields; +Cc: linux-nfs, Krishna Kumar
From: Krishna Kumar <krkumar2@in.ibm.com>
Patch summary:
--------------
Change the caching from ino/dev to a file handle model. Advantages:
1. Since file handles are unique, this patch removes all dependencies on the
kernel readahead parameters and it's implementation; and instead caches
files based on file handles.
2. Allows the server to not have to open/close a file multiple times when the
client reads it, and results in faster lookup times.
3. Readahead is automatically taken care of since the file is not closed while
it is getting read (quickly) by the client.
4. Another optimization is to avoid getting the cache lock twice for each read
operation (after the first time it is taken to update the cache).
List of changes from Ver1:
--------------------------
1. Implement entire logic of either updating-cache or do-nothing or close-file
in fh_cache_upd. This simplifies the caller (nfsd_read).
2. nfsd_get_fhcache doesn't overwrite existing entries which would require the
existing entries to be freed up - that is done by the daemon exclusively.
This saves time at lookup by avoiding freeing entries (fh_cache_put).
Another change is to optimize the logic of selecting a free slot.
3. Due to #2, fh_cache_upd doesn't have to test whether the entry is already
on the daemon list (it never is, plus list_del_init is changed to list_del).
4. As a result of #2, the daemon becomes simpler - there is no race to handle
where there is an entry on the list but it has no cached file/dentry, etc.
5. Made some comments clearer and easier to understand.
Performance:
-------------
This patch was tested with clients running 2, 4, 8, 16 --- 256 test processes,
each doing reads of different files. Each test includes different I/O sizes.
31/77 tests got more than 5% improvement; with 5 tests getting more than 10%
gain (full results at the end of this post).
Please review. Any comments or improvement ideas are appreciated.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
(#Test Processes on Client == #NFSD's on Server)
-----------------------------------------------------------------------
#Test Processes Bufsize Org BW KB/s New BW KB/s %
-----------------------------------------------------------------------
2 256 68022.46 71094.64 4.51
2 4096 67833.74 70726.38 4.26
2 8192 64541.14 69635.93 7.89
2 16384 65708.86 68994.88 5.00
2 32768 64272.28 68525.36 6.61
2 65536 64684.13 69103.28 6.83
2 131072 64765.67 68855.57 6.31
4 256 60849.48 64702.04 6.33
4 4096 60660.67 64309.37 6.01
4 8192 60506.00 61142.84 1.05
4 16384 60796.86 64069.82 5.38
4 32768 60947.07 64648.57 6.07
4 65536 60774.45 63735.24 4.87
4 131072 61369.66 65261.85 6.34
8 256 49239.57 54467.33 10.61
8 4096 50650.45 55400.01 9.37
8 8192 50661.58 51732.16 2.11
8 16384 51114.76 56025.31 9.60
8 32768 52367.20 54348.05 3.78
8 65536 51000.23 54285.63 6.44
8 131072 52996.73 54021.11 1.93
16 256 44534.67 45478.60 2.11
16 4096 43897.69 46519.89 5.97
16 8192 43787.87 44083.61 .67
16 16384 43883.62 46726.03 6.47
16 32768 44284.96 44035.86 -.56
16 65536 43804.33 44865.20 2.42
16 131072 44525.30 43752.62 -1.73
32 256 40420.30 42575.30 5.33
32 4096 39913.14 42279.21 5.92
32 8192 40261.19 42399.93 5.31
32 16384 38094.95 42645.32 11.94
32 32768 40610.27 43015.37 5.92
32 65536 41438.05 41794.76 .86
32 131072 41869.06 43644.07 4.23
48 256 36986.45 40185.34 8.64
48 4096 36585.79 38227.38 4.48
48 8192 38406.78 38055.91 -.91
48 16384 34950.05 36688.86 4.97
48 32768 38908.71 37900.33 -2.59
48 65536 39364.64 40036.67 1.70
48 131072 40391.56 40887.11 1.22
64 256 32821.89 34568.06 5.32
64 4096 35468.42 35529.29 .17
64 8192 34135.44 36462.31 6.81
64 16384 31926.51 32694.91 2.40
64 32768 35527.69 35234.60 -.82
64 65536 36066.08 36359.77 .81
64 131072 35969.04 37462.86 4.15
96 256 30032.74 29973.45 -.19
96 4096 29687.06 30881.64 4.02
96 8192 31142.51 32504.66 4.37
96 16384 29546.77 30663.39 3.77
96 32768 32458.94 32797.36 1.04
96 65536 32826.99 33451.66 1.90
96 131072 33601.46 34177.39 1.71
128 256 28584.59 29092.11 1.77
128 4096 29311.11 30097.65 2.68
128 8192 31398.87 33154.63 5.59
128 16384 28283.58 29071.45 2.78
128 32768 32819.93 33654.11 2.54
128 65536 32617.13 33778.46 3.56
128 131072 32972.71 34160.82 3.60
192 256 25245.92 26331.48 4.29
192 4096 27368.48 29318.49 7.12
192 8192 30173.74 31477.35 4.32
192 16384 26388.54 29719.15 12.62
192 32768 31840.91 33606.17 5.54
192 65536 33374.85 33607.14 .69
192 131072 33523.48 32601.93 -2.74
256 256 22256.91 21139.79 -5.01
256 4096 25192.75 24281.51 -3.61
256 8192 26534.95 28100.59 5.90
256 16384 24162.85 25607.86 5.98
256 32768 29218.38 29417.28 .68
256 65536 29609.59 30049.79 1.48
256 131072 30014.29 30132.33 .39
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle [not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2009-01-24 12:35 ` Krishna Kumar [not found] ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Krishna Kumar @ 2009-01-24 12:35 UTC (permalink / raw) To: bfields; +Cc: linux-nfs, Krishna Kumar From: Krishna Kumar <krkumar2@in.ibm.com> Implement the file handle caching. List of changes: 1. Remove all implementation and users of readahead, rename RA to FH, parm to cache. 2. Add fields in the fhparms to cache file, svc_export, expiry list and expiry time. Modify some other fields (eg p_count is atomic). 3. Implement a daemon to clean up cached FH's. 4. Added four helper functions: fh_cache_get: Hold a reference to dentry and svc_export. fh_cache_put: Drop a reference to file, dentry and svc_export. fh_get_cached_entries: Returns cached file and svc_export. fh_cache_upd: Cache file and svc_export. Add entry to list for daemon to cleanup. 5. nfsd_get_raparms is slightly rewritten (changed to nfsd_get_fhcache). 6. nfsd_read rewritten to use the cache, remove RA from nfsd_vfs_read. 7. File remove operation from the client results in the server checking the cache and drops reference immediately (remove operation on the server still retains the reference for a short time before the daemon frees up the entry). 8. init and shutdown are slightly modified. (ra_size, ra_depth, nfsd_racache_init and nfsd_racache_shutdown still retain the "ra" prefix for now) Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> --- fs/nfsd/vfs.c | 429 +++++++++++++++++++++++++++++++++++++----------- 1 file changed, 338 insertions(+), 91 deletions(-) diff -ruNp 2.6.29-rc2.org/fs/nfsd/vfs.c 2.6.29-rc2.new/fs/nfsd/vfs.c --- 2.6.29-rc2.org/fs/nfsd/vfs.c 2009-01-22 13:23:18.000000000 +0530 +++ 2.6.29-rc2.new/fs/nfsd/vfs.c 2009-01-22 13:23:18.000000000 +0530 @@ -55,38 +55,53 @@ #include <linux/security.h> #endif /* CONFIG_NFSD_V4 */ #include <linux/jhash.h> +#include <linux/kthread.h> #include <asm/uaccess.h> #define NFSDDBG_FACILITY NFSDDBG_FILEOP +/* Number of jiffies to cache the file before releasing */ +#define NFSD_CACHE_JIFFIES 100 /* - * This is a cache of readahead params that help us choose the proper - * readahead strategy. Initially, we set all readahead parameters to 0 - * and let the VFS handle things. + * This is a cache of file handles to quicken file lookup. This also + * helps prevent multiple open/close of a file when a client reads it. + * * If you increase the number of cached files very much, you'll need to * add a hash table here. */ -struct raparms { - struct raparms *p_next; - unsigned int p_count; - ino_t p_ino; - dev_t p_dev; - int p_set; - struct file_ra_state p_ra; +struct fhcache { + struct fhcache *p_next; + + /* Hashed on this parameter */ + __u32 p_auth; + + /* Cached information */ + struct file *p_filp; + struct svc_export *p_exp; + + /* Refcount for overwrite */ + atomic_t p_count; + + /* When this entry expires */ + unsigned long p_expires; + + /* List of entries linked to 'nfsd_daemon_list' */ + struct list_head p_list; + unsigned int p_hindex; }; -struct raparm_hbucket { - struct raparms *pb_head; +struct fhcache_hbucket { + struct fhcache *pb_head; spinlock_t pb_lock; } ____cacheline_aligned_in_smp; -#define RAPARM_HASH_BITS 4 -#define RAPARM_HASH_SIZE (1<<RAPARM_HASH_BITS) -#define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1) -static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE]; +#define FHPARM_HASH_BITS 8 +#define FHPARM_HASH_SIZE (1<<FHPARM_HASH_BITS) +#define FHPARM_HASH_MASK (FHPARM_HASH_SIZE-1) +static struct fhcache_hbucket fhcache_hash[FHPARM_HASH_SIZE]; /* * Called from nfsd_lookup and encode_dirent. Check if we have crossed @@ -784,51 +799,223 @@ nfsd_sync_dir(struct dentry *dp) return nfsd_dosync(NULL, dp, dp->d_inode->i_fop); } +/* Daemon to handle expired fh cache entries */ +static struct task_struct *k_nfsd_task; + +/* Synchronization for daemon with enqueuer's */ +static DEFINE_SPINLOCK(k_nfsd_lock); + +/* List of FH cache entries that has to be cleaned up when they expire */ +static struct list_head nfsd_daemon_list; + +/* + * Returns cached values of 'file' and svc_export; resets these entries + * to NULL. + */ +static inline void fh_get_cached_entries(struct fhcache *fh, + struct file **filep, + struct svc_export **expp) +{ + *filep = fh->p_filp; + *expp = fh->p_exp; + + fh->p_filp = NULL; + fh->p_exp = NULL; +} + +/* + * Hold a reference to dentry and svc_export (file already has an extra + * reference count as it is not closed normally. + */ +static inline void fh_cache_get(struct file *file, struct svc_export *exp) +{ + dget(file->f_dentry); + cache_get(&exp->h); +} + +/* Drop a reference to file, dentry and svc_export */ +static inline void fh_cache_put(struct file *file, struct svc_export *exp) +{ + cache_put(&exp->h, &svc_export_cache); + dput(file->f_dentry); + fput(file); +} + /* - * Obtain the readahead parameters for the file - * specified by (dev, ino). + * Holds a reference to dentry and svc_export, and caches both. Add fh entry + * to list for daemon to cleanup later. Once we add the entry to the list, + * we'd rather it expire prematurely rather than updating it on every read. */ +static inline void fh_cache_upd(struct fhcache *fh, struct file *file, + struct svc_export *exp) +{ + if (fh) { + if (!fh->p_filp && file) { + struct fhcache_hbucket *fhb; + + fh_cache_get(file, exp); + + fhb = &fhcache_hash[fh->p_hindex]; + + spin_lock(&fhb->pb_lock); + fh->p_filp = file; + fh->p_exp = exp; + fh->p_expires = jiffies + NFSD_CACHE_JIFFIES; -static inline struct raparms * -nfsd_get_raparms(dev_t dev, ino_t ino) + spin_lock(&k_nfsd_lock); + list_add_tail(&fh->p_list, &nfsd_daemon_list); + spin_unlock(&k_nfsd_lock); + + spin_unlock(&fhb->pb_lock); + } + + /* Drop our reference */ + atomic_dec(&fh->p_count); + } else if (file) + nfsd_close(file); +} + +/* Daemon cache cleanup handler */ +void daemon_free_entries(void) { - struct raparms *ra, **rap, **frap = NULL; - int depth = 0; - unsigned int hash; - struct raparm_hbucket *rab; + unsigned long now = jiffies; + + spin_lock(&k_nfsd_lock); + while (!list_empty(&nfsd_daemon_list)) { + struct fhcache *fh = list_entry(nfsd_daemon_list.next, + struct fhcache, p_list); + struct fhcache_hbucket *fhb; + struct file *file; + struct svc_export *exp; + + if (time_after(fh->p_expires, now) || now != jiffies) { + /* + * This (and all subsequent entries) have not expired; + * or we have spent too long in this loop. + */ + break; + } + + fhb = &fhcache_hash[fh->p_hindex]; + + /* + * Make sure we do not deadlock with updaters - we can free + * entry next time in case of a race. + */ + if (!spin_trylock(&fhb->pb_lock)) { + /* + * Entry is being used, no need to free this, try later + */ + break; + } + + if (atomic_read(&fh->p_count)) { + spin_unlock(&fhb->pb_lock); + break; + } - hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK; - rab = &raparm_hash[hash]; + list_del(&fh->p_list); + fh_get_cached_entries(fh, &file, &exp); + spin_unlock(&fhb->pb_lock); + spin_unlock(&k_nfsd_lock); + + fh_cache_put(file, exp); + spin_lock(&k_nfsd_lock); + } + spin_unlock(&k_nfsd_lock); +} + +static int k_nfsd_thread(void *unused) +{ + while (!kthread_should_stop()) { + schedule_timeout_interruptible(NFSD_CACHE_JIFFIES); - spin_lock(&rab->pb_lock); - for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) { - if (ra->p_ino == ino && ra->p_dev == dev) + if (kthread_should_stop()) + break; + + daemon_free_entries(); + } + __set_current_state(TASK_RUNNING); + + return 0; +} + +/* + * Obtain the cached file, export and d_inode values for the FH + * specified by fh->auth[3] + */ +static inline struct fhcache * +nfsd_get_fhcache(__u32 auth) +{ + struct fhcache *fh, **fhp, **ffhp = NULL; + int depth = 0; + unsigned int hash; + struct fhcache_hbucket *fhb; + + if (!auth) + return NULL; + + hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK; + fhb = &fhcache_hash[hash]; + + spin_lock(&fhb->pb_lock); + for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) { + if (fh->p_auth == auth) { + /* Same inode */ + if (unlikely(!fh->p_filp)) { + /* + * Someone is racing in the same code, and + * this is the first reference to this file. + */ + spin_unlock(&fhb->pb_lock); + return NULL; + } + + /* + * Hold an extra reference to dentry/exp since these + * are released in fh_put(). 'file' already has an + * extra hold from the first lookup which was never + * dropped. + */ + fh_cache_get(fh->p_filp, fh->p_exp); goto found; + } + depth++; - if (ra->p_count == 0) - frap = rap; + + if (!ffhp && !fh->p_filp) { + /* + * This is an unused inode (or a different one), and no + * entry was found till now. + */ + if (!atomic_read(&fh->p_count)) /* Entry is unused */ + ffhp = fhp; + } } - depth = nfsdstats.ra_size*11/10; - if (!frap) { - spin_unlock(&rab->pb_lock); + + if (!ffhp) { + spin_unlock(&fhb->pb_lock); return NULL; } - rap = frap; - ra = *frap; - ra->p_dev = dev; - ra->p_ino = ino; - ra->p_set = 0; - ra->p_hindex = hash; + + depth = nfsdstats.ra_size*11/10; + fhp = ffhp; + fh = *ffhp; + fh->p_hindex = hash; + fh->p_auth = auth; + found: - if (rap != &rab->pb_head) { - *rap = ra->p_next; - ra->p_next = rab->pb_head; - rab->pb_head = ra; + if (fhp != &fhb->pb_head) { + *fhp = fh->p_next; + fh->p_next = fhb->pb_head; + fhb->pb_head = fh; } - ra->p_count++; + + atomic_inc(&fh->p_count); nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++; - spin_unlock(&rab->pb_lock); - return ra; + spin_unlock(&fhb->pb_lock); + + return fh; } /* @@ -892,7 +1079,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st loff_t offset, struct kvec *vec, int vlen, unsigned long *count) { struct inode *inode; - struct raparms *ra; mm_segment_t oldfs; __be32 err; int host_err; @@ -903,11 +1089,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count)) goto out; - /* Get readahead parameters */ - ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino); - - if (ra && ra->p_set) - file->f_ra = ra->p_ra; if (file->f_op->splice_read && rqstp->rq_splice_ok) { struct splice_desc sd = { @@ -926,16 +1107,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st set_fs(oldfs); } - /* Write back readahead params */ - if (ra) { - struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex]; - spin_lock(&rab->pb_lock); - ra->p_ra = file->f_ra; - ra->p_set = 1; - ra->p_count--; - spin_unlock(&rab->pb_lock); - } - if (host_err >= 0) { nfsdstats.io_read += host_err; *count = host_err; @@ -1078,12 +1249,30 @@ nfsd_read(struct svc_rqst *rqstp, struct goto out; err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count); } else { - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file); - if (err) - goto out; - err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count); - nfsd_close(file); + struct fhcache *fh; + + /* Check if this fh is cached */ + fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]); + if (fh && fh->p_filp) { + /* Got cached values */ + file = fh->p_filp; + fhp->fh_dentry = file->f_dentry; + fhp->fh_export = fh->p_exp; + err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ); + } else { + /* Nothing in cache */ + err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, + &file); + } + + if (!err) + err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, + count); + + /* Update cached values if required, and clean up */ + fh_cache_upd(fh, file, fhp->fh_export); } + out: return err; } @@ -1791,6 +1980,39 @@ nfsd_unlink(struct svc_rqst *rqstp, stru goto out_nfserr; if (type != S_IFDIR) { /* It's UNLINK */ + int i, found = 0; + + for (i = 0 ; i < FHPARM_HASH_SIZE && !found; i++) { + struct fhcache_hbucket *fhb = &fhcache_hash[i]; + struct fhcache *fh; + + spin_lock(&fhb->pb_lock); + for (fh = fhb->pb_head; fh; fh = fh->p_next) { + if (fh->p_filp && + fh->p_filp->f_dentry == rdentry) { + /* Found the entry for removed file */ + struct file *file; + struct svc_export *exp; + + fh_get_cached_entries(fh, &file, &exp); + + spin_lock(&k_nfsd_lock); + list_del(&fh->p_list); + spin_unlock(&k_nfsd_lock); + + spin_unlock(&fhb->pb_lock); + + /* Drop reference to this entry */ + fh_cache_put(file, exp); + + spin_lock(&fhb->pb_lock); + found = 1; + break; + } + } + spin_unlock(&fhb->pb_lock); + } + #ifdef MSNFS if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) && (atomic_read(&rdentry->d_count) > 1)) { @@ -2061,23 +2283,36 @@ nfsd_permission(struct svc_rqst *rqstp, void nfsd_racache_shutdown(void) { - struct raparms *raparm, *last_raparm; unsigned int i; - dprintk("nfsd: freeing readahead buffers.\n"); + dprintk("nfsd: freeing FH buffers.\n"); - for (i = 0; i < RAPARM_HASH_SIZE; i++) { - raparm = raparm_hash[i].pb_head; - while(raparm) { - last_raparm = raparm; - raparm = raparm->p_next; - kfree(last_raparm); + /* Stop the daemon and free the list entries */ + kthread_stop(k_nfsd_task); + k_nfsd_task = NULL; + + for (i = 0; i < FHPARM_HASH_SIZE; i++) { + struct fhcache *fhcache, *last_fhcache; + + fhcache = fhcache_hash[i].pb_head; + while(fhcache) { + last_fhcache = fhcache; + if (fhcache->p_filp) { + struct file *file; + struct svc_export *exp; + + fh_get_cached_entries(fhcache, &file, &exp); + list_del(&fhcache->p_list); + fh_cache_put(file, exp); + } + fhcache = fhcache->p_next; + kfree(last_fhcache); } - raparm_hash[i].pb_head = NULL; + fhcache_hash[i].pb_head = NULL; } } /* - * Initialize readahead param cache + * Initialize file handle cache */ int nfsd_racache_init(int cache_size) @@ -2085,36 +2320,48 @@ nfsd_racache_init(int cache_size) int i; int j = 0; int nperbucket; - struct raparms **raparm = NULL; + struct fhcache **fhcache = NULL; - if (raparm_hash[0].pb_head) + if (fhcache_hash[0].pb_head) return 0; - nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE); + nperbucket = DIV_ROUND_UP(cache_size, FHPARM_HASH_SIZE); if (nperbucket < 2) nperbucket = 2; - cache_size = nperbucket * RAPARM_HASH_SIZE; + cache_size = nperbucket * FHPARM_HASH_SIZE; - dprintk("nfsd: allocating %d readahead buffers.\n", cache_size); + dprintk("nfsd: allocating %d file handle cache buffers.\n", cache_size); - for (i = 0; i < RAPARM_HASH_SIZE; i++) { - spin_lock_init(&raparm_hash[i].pb_lock); + for (i = 0; i < FHPARM_HASH_SIZE; i++) { + spin_lock_init(&fhcache_hash[i].pb_lock); - raparm = &raparm_hash[i].pb_head; + fhcache = &fhcache_hash[i].pb_head; for (j = 0; j < nperbucket; j++) { - *raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL); - if (!*raparm) + *fhcache = kzalloc(sizeof(struct fhcache), GFP_KERNEL); + if (!*fhcache) { + dprintk("nfsd: kmalloc failed, freeing file cache buffers\n"); goto out_nomem; - raparm = &(*raparm)->p_next; + } + INIT_LIST_HEAD(&(*fhcache)->p_list); + fhcache = &(*fhcache)->p_next; } - *raparm = NULL; + *fhcache = NULL; } nfsdstats.ra_size = cache_size; + + INIT_LIST_HEAD(&nfsd_daemon_list); + k_nfsd_task = kthread_run(k_nfsd_thread, NULL, "nfsd_cacher"); + + if (IS_ERR(k_nfsd_task)) { + printk(KERN_ERR "%s: unable to create kernel thread: %ld\n", + __FUNCTION__, PTR_ERR(k_nfsd_task)); + goto out_nomem; + } + return 0; out_nomem: - dprintk("nfsd: kmalloc failed, freeing readahead buffers\n"); nfsd_racache_shutdown(); return -ENOMEM; } ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle [not found] ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2009-02-20 23:49 ` Jeff Layton [not found] ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Jeff Layton @ 2009-02-20 23:49 UTC (permalink / raw) To: Krishna Kumar; +Cc: bfields, linux-nfs On Sat, 24 Jan 2009 18:05:11 +0530 Krishna Kumar <krkumar2@in.ibm.com> wrote: > From: Krishna Kumar <krkumar2@in.ibm.com> > > Implement the file handle caching. List of changes: > > 1. Remove all implementation and users of readahead, rename RA to FH, > parm to cache. > 2. Add fields in the fhparms to cache file, svc_export, expiry list > and expiry time. Modify some other fields (eg p_count is atomic). > 3. Implement a daemon to clean up cached FH's. > 4. Added four helper functions: > fh_cache_get: Hold a reference to dentry and svc_export. > fh_cache_put: Drop a reference to file, dentry and svc_export. > fh_get_cached_entries: Returns cached file and svc_export. > fh_cache_upd: Cache file and svc_export. Add entry to list for > daemon to cleanup. > 5. nfsd_get_raparms is slightly rewritten (changed to nfsd_get_fhcache). > 6. nfsd_read rewritten to use the cache, remove RA from nfsd_vfs_read. > 7. File remove operation from the client results in the server checking > the cache and drops reference immediately (remove operation on the > server still retains the reference for a short time before the > daemon frees up the entry). > 8. init and shutdown are slightly modified. > > (ra_size, ra_depth, nfsd_racache_init and nfsd_racache_shutdown still > retain the "ra" prefix for now) > > > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> > --- > fs/nfsd/vfs.c | 429 +++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 338 insertions(+), 91 deletions(-) > Hi Krishna. Sorry for the slow response on my part as well. First, some general comments: It would be nice to break up this patchset some. It's very hard to review large-scale changes like this, and this is quite frankly not code that is often touched by human hands. Being able to do a git bisect if we run across bugs after these changes would be a very nice thing. It won't be helpful though unless the changes are in smaller pieces. As a general guideline you'll want to try to do the changes in smaller pieces, but each patch should leave the tree in a working state. It's hard to be more specific than that without delving more deeply (which I don't have time to do at the moment). In your earlier discussion with Bruce, you mentioned trying to determine when to flush the cache. When the exports table is changed via exportfs, the exports kernel cache is also flushed. Hooking into that might be the best thing... I'd also go ahead and get rid of the ra_ prefixes unless you feel they're needed. It'd be best to clean this up so that we don't have to muck around in here later. More comments below: > diff -ruNp 2.6.29-rc2.org/fs/nfsd/vfs.c 2.6.29-rc2.new/fs/nfsd/vfs.c > --- 2.6.29-rc2.org/fs/nfsd/vfs.c 2009-01-22 13:23:18.000000000 +0530 > +++ 2.6.29-rc2.new/fs/nfsd/vfs.c 2009-01-22 13:23:18.000000000 +0530 > @@ -55,38 +55,53 @@ > #include <linux/security.h> > #endif /* CONFIG_NFSD_V4 */ > #include <linux/jhash.h> > +#include <linux/kthread.h> > > #include <asm/uaccess.h> > > #define NFSDDBG_FACILITY NFSDDBG_FILEOP > > +/* Number of jiffies to cache the file before releasing */ > +#define NFSD_CACHE_JIFFIES 100 > ^^^^ It'd probably be better to express this in terms of HZ so that this is a fixed amount of time regardless of how the kernel is compiled. > /* > - * This is a cache of readahead params that help us choose the proper > - * readahead strategy. Initially, we set all readahead parameters to 0 > - * and let the VFS handle things. > + * This is a cache of file handles to quicken file lookup. This also > + * helps prevent multiple open/close of a file when a client reads it. > + * > * If you increase the number of cached files very much, you'll need to > * add a hash table here. > */ > -struct raparms { > - struct raparms *p_next; > - unsigned int p_count; > - ino_t p_ino; > - dev_t p_dev; > - int p_set; > - struct file_ra_state p_ra; > +struct fhcache { > + struct fhcache *p_next; > + > + /* Hashed on this parameter */ > + __u32 p_auth; > + > + /* Cached information */ > + struct file *p_filp; > + struct svc_export *p_exp; > + > + /* Refcount for overwrite */ > + atomic_t p_count; > + > + /* When this entry expires */ > + unsigned long p_expires; > + > + /* List of entries linked to 'nfsd_daemon_list' */ > + struct list_head p_list; > + > unsigned int p_hindex; > }; > > -struct raparm_hbucket { > - struct raparms *pb_head; > +struct fhcache_hbucket { > + struct fhcache *pb_head; > spinlock_t pb_lock; > } ____cacheline_aligned_in_smp; > > -#define RAPARM_HASH_BITS 4 > -#define RAPARM_HASH_SIZE (1<<RAPARM_HASH_BITS) > -#define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1) > -static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE]; > +#define FHPARM_HASH_BITS 8 > +#define FHPARM_HASH_SIZE (1<<FHPARM_HASH_BITS) > +#define FHPARM_HASH_MASK (FHPARM_HASH_SIZE-1) > +static struct fhcache_hbucket fhcache_hash[FHPARM_HASH_SIZE]; > > /* > * Called from nfsd_lookup and encode_dirent. Check if we have crossed > @@ -784,51 +799,223 @@ nfsd_sync_dir(struct dentry *dp) > return nfsd_dosync(NULL, dp, dp->d_inode->i_fop); > } > > +/* Daemon to handle expired fh cache entries */ > +static struct task_struct *k_nfsd_task; > + > +/* Synchronization for daemon with enqueuer's */ > +static DEFINE_SPINLOCK(k_nfsd_lock); > + > +/* List of FH cache entries that has to be cleaned up when they expire */ > +static struct list_head nfsd_daemon_list; > + ^^^ some more descriptive variable names would be welcome... > +/* > + * Returns cached values of 'file' and svc_export; resets these entries > + * to NULL. > + */ > +static inline void fh_get_cached_entries(struct fhcache *fh, > + struct file **filep, > + struct svc_export **expp) > +{ > + *filep = fh->p_filp; > + *expp = fh->p_exp; > + > + fh->p_filp = NULL; > + fh->p_exp = NULL; > +} > + > +/* > + * Hold a reference to dentry and svc_export (file already has an extra > + * reference count as it is not closed normally. > + */ > +static inline void fh_cache_get(struct file *file, struct svc_export *exp) > +{ > + dget(file->f_dentry); > + cache_get(&exp->h); > +} > + > +/* Drop a reference to file, dentry and svc_export */ > +static inline void fh_cache_put(struct file *file, struct svc_export *exp) > +{ > + cache_put(&exp->h, &svc_export_cache); > + dput(file->f_dentry); > + fput(file); > +} > + > /* > - * Obtain the readahead parameters for the file > - * specified by (dev, ino). > + * Holds a reference to dentry and svc_export, and caches both. Add fh entry > + * to list for daemon to cleanup later. Once we add the entry to the list, > + * we'd rather it expire prematurely rather than updating it on every read. > */ > +static inline void fh_cache_upd(struct fhcache *fh, struct file *file, > + struct svc_export *exp) > +{ > + if (fh) { > + if (!fh->p_filp && file) { > + struct fhcache_hbucket *fhb; > + > + fh_cache_get(file, exp); > + > + fhb = &fhcache_hash[fh->p_hindex]; > + > + spin_lock(&fhb->pb_lock); > + fh->p_filp = file; > + fh->p_exp = exp; > + fh->p_expires = jiffies + NFSD_CACHE_JIFFIES; > > -static inline struct raparms * > -nfsd_get_raparms(dev_t dev, ino_t ino) > + spin_lock(&k_nfsd_lock); > + list_add_tail(&fh->p_list, &nfsd_daemon_list); > + spin_unlock(&k_nfsd_lock); > + > + spin_unlock(&fhb->pb_lock); > + } > + > + /* Drop our reference */ > + atomic_dec(&fh->p_count); > + } else if (file) > + nfsd_close(file); > +} > + > +/* Daemon cache cleanup handler */ > +void daemon_free_entries(void) > { > - struct raparms *ra, **rap, **frap = NULL; > - int depth = 0; > - unsigned int hash; > - struct raparm_hbucket *rab; > + unsigned long now = jiffies; > + > + spin_lock(&k_nfsd_lock); > + while (!list_empty(&nfsd_daemon_list)) { > + struct fhcache *fh = list_entry(nfsd_daemon_list.next, > + struct fhcache, p_list); > + struct fhcache_hbucket *fhb; > + struct file *file; > + struct svc_export *exp; > + > + if (time_after(fh->p_expires, now) || now != jiffies) { > + /* > + * This (and all subsequent entries) have not expired; > + * or we have spent too long in this loop. > + */ > + break; > + } > + > + fhb = &fhcache_hash[fh->p_hindex]; > + > + /* > + * Make sure we do not deadlock with updaters - we can free > + * entry next time in case of a race. > + */ > + if (!spin_trylock(&fhb->pb_lock)) { > + /* > + * Entry is being used, no need to free this, try later > + */ > + break; > + } > + > + if (atomic_read(&fh->p_count)) { > + spin_unlock(&fhb->pb_lock); > + break; > + } > > - hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK; > - rab = &raparm_hash[hash]; > + list_del(&fh->p_list); > + fh_get_cached_entries(fh, &file, &exp); > + spin_unlock(&fhb->pb_lock); > + spin_unlock(&k_nfsd_lock); > + > + fh_cache_put(file, exp); > + spin_lock(&k_nfsd_lock); > + } > + spin_unlock(&k_nfsd_lock); > +} > + > +static int k_nfsd_thread(void *unused) > +{ > + while (!kthread_should_stop()) { > + schedule_timeout_interruptible(NFSD_CACHE_JIFFIES); > > - spin_lock(&rab->pb_lock); > - for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) { > - if (ra->p_ino == ino && ra->p_dev == dev) > + if (kthread_should_stop()) > + break; > + > + daemon_free_entries(); > + } > + __set_current_state(TASK_RUNNING); > + > + return 0; > +} > + ^^^ ...I can't say I'm thrilled about adding a kthread for this but don't have any specific objections. I wonder if it might be better to just periodically schedule (and reschedule) delayed work to the events queue whenever a cache entry is touched? > +/* > + * Obtain the cached file, export and d_inode values for the FH > + * specified by fh->auth[3] > + */ > +static inline struct fhcache * > +nfsd_get_fhcache(__u32 auth) > +{ > + struct fhcache *fh, **fhp, **ffhp = NULL; > + int depth = 0; > + unsigned int hash; > + struct fhcache_hbucket *fhb; > + > + if (!auth) > + return NULL; > + > + hash = jhash_1word(auth, 0xfeedbeef) & FHPARM_HASH_MASK; > + fhb = &fhcache_hash[hash]; > + > + spin_lock(&fhb->pb_lock); > + for (fhp = &fhb->pb_head; (fh = *fhp); fhp = &fh->p_next) { > + if (fh->p_auth == auth) { > + /* Same inode */ > + if (unlikely(!fh->p_filp)) { > + /* > + * Someone is racing in the same code, and > + * this is the first reference to this file. > + */ > + spin_unlock(&fhb->pb_lock); > + return NULL; > + } > + > + /* > + * Hold an extra reference to dentry/exp since these > + * are released in fh_put(). 'file' already has an > + * extra hold from the first lookup which was never > + * dropped. > + */ > + fh_cache_get(fh->p_filp, fh->p_exp); > goto found; > + } > + > depth++; > - if (ra->p_count == 0) > - frap = rap; > + > + if (!ffhp && !fh->p_filp) { > + /* > + * This is an unused inode (or a different one), and no > + * entry was found till now. > + */ > + if (!atomic_read(&fh->p_count)) /* Entry is unused */ > + ffhp = fhp; > + } > } > - depth = nfsdstats.ra_size*11/10; > - if (!frap) { > - spin_unlock(&rab->pb_lock); > + > + if (!ffhp) { > + spin_unlock(&fhb->pb_lock); > return NULL; > } > - rap = frap; > - ra = *frap; > - ra->p_dev = dev; > - ra->p_ino = ino; > - ra->p_set = 0; > - ra->p_hindex = hash; > + > + depth = nfsdstats.ra_size*11/10; > + fhp = ffhp; > + fh = *ffhp; > + fh->p_hindex = hash; > + fh->p_auth = auth; > + > found: > - if (rap != &rab->pb_head) { > - *rap = ra->p_next; > - ra->p_next = rab->pb_head; > - rab->pb_head = ra; > + if (fhp != &fhb->pb_head) { > + *fhp = fh->p_next; > + fh->p_next = fhb->pb_head; > + fhb->pb_head = fh; > } > - ra->p_count++; > + > + atomic_inc(&fh->p_count); > nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++; > - spin_unlock(&rab->pb_lock); > - return ra; > + spin_unlock(&fhb->pb_lock); > + > + return fh; > } > > /* > @@ -892,7 +1079,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st > loff_t offset, struct kvec *vec, int vlen, unsigned long *count) > { > struct inode *inode; > - struct raparms *ra; > mm_segment_t oldfs; > __be32 err; > int host_err; > @@ -903,11 +1089,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st > if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count)) > goto out; > > - /* Get readahead parameters */ > - ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino); > - > - if (ra && ra->p_set) > - file->f_ra = ra->p_ra; > > if (file->f_op->splice_read && rqstp->rq_splice_ok) { > struct splice_desc sd = { > @@ -926,16 +1107,6 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st > set_fs(oldfs); > } > > - /* Write back readahead params */ > - if (ra) { > - struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex]; > - spin_lock(&rab->pb_lock); > - ra->p_ra = file->f_ra; > - ra->p_set = 1; > - ra->p_count--; > - spin_unlock(&rab->pb_lock); > - } > - > if (host_err >= 0) { > nfsdstats.io_read += host_err; > *count = host_err; > @@ -1078,12 +1249,30 @@ nfsd_read(struct svc_rqst *rqstp, struct > goto out; > err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count); > } else { > - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file); > - if (err) > - goto out; > - err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count); > - nfsd_close(file); > + struct fhcache *fh; > + > + /* Check if this fh is cached */ > + fh = nfsd_get_fhcache(fhp->fh_handle.fh_auth[3]); > + if (fh && fh->p_filp) { > + /* Got cached values */ > + file = fh->p_filp; > + fhp->fh_dentry = file->f_dentry; > + fhp->fh_export = fh->p_exp; > + err = fh_verify(rqstp, fhp, S_IFREG, NFSD_MAY_READ); > + } else { > + /* Nothing in cache */ > + err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, > + &file); > + } > + > + if (!err) > + err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, > + count); > + > + /* Update cached values if required, and clean up */ > + fh_cache_upd(fh, file, fhp->fh_export); > } > + > out: > return err; > } > @@ -1791,6 +1980,39 @@ nfsd_unlink(struct svc_rqst *rqstp, stru > goto out_nfserr; > > if (type != S_IFDIR) { /* It's UNLINK */ > + int i, found = 0; > + > + for (i = 0 ; i < FHPARM_HASH_SIZE && !found; i++) { > + struct fhcache_hbucket *fhb = &fhcache_hash[i]; > + struct fhcache *fh; > + > + spin_lock(&fhb->pb_lock); > + for (fh = fhb->pb_head; fh; fh = fh->p_next) { > + if (fh->p_filp && > + fh->p_filp->f_dentry == rdentry) { > + /* Found the entry for removed file */ > + struct file *file; > + struct svc_export *exp; > + > + fh_get_cached_entries(fh, &file, &exp); > + > + spin_lock(&k_nfsd_lock); > + list_del(&fh->p_list); > + spin_unlock(&k_nfsd_lock); > + > + spin_unlock(&fhb->pb_lock); > + > + /* Drop reference to this entry */ > + fh_cache_put(file, exp); > + > + spin_lock(&fhb->pb_lock); > + found = 1; > + break; > + } > + } > + spin_unlock(&fhb->pb_lock); > + } > + > #ifdef MSNFS > if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) && > (atomic_read(&rdentry->d_count) > 1)) { > @@ -2061,23 +2283,36 @@ nfsd_permission(struct svc_rqst *rqstp, > void > nfsd_racache_shutdown(void) > { > - struct raparms *raparm, *last_raparm; > unsigned int i; > > - dprintk("nfsd: freeing readahead buffers.\n"); > + dprintk("nfsd: freeing FH buffers.\n"); > > - for (i = 0; i < RAPARM_HASH_SIZE; i++) { > - raparm = raparm_hash[i].pb_head; > - while(raparm) { > - last_raparm = raparm; > - raparm = raparm->p_next; > - kfree(last_raparm); > + /* Stop the daemon and free the list entries */ > + kthread_stop(k_nfsd_task); > + k_nfsd_task = NULL; > + > + for (i = 0; i < FHPARM_HASH_SIZE; i++) { > + struct fhcache *fhcache, *last_fhcache; > + > + fhcache = fhcache_hash[i].pb_head; > + while(fhcache) { > + last_fhcache = fhcache; > + if (fhcache->p_filp) { > + struct file *file; > + struct svc_export *exp; > + > + fh_get_cached_entries(fhcache, &file, &exp); > + list_del(&fhcache->p_list); > + fh_cache_put(file, exp); > + } > + fhcache = fhcache->p_next; > + kfree(last_fhcache); > } > - raparm_hash[i].pb_head = NULL; > + fhcache_hash[i].pb_head = NULL; > } > } > /* > - * Initialize readahead param cache > + * Initialize file handle cache > */ > int > nfsd_racache_init(int cache_size) > @@ -2085,36 +2320,48 @@ nfsd_racache_init(int cache_size) > int i; > int j = 0; > int nperbucket; > - struct raparms **raparm = NULL; > + struct fhcache **fhcache = NULL; > > > - if (raparm_hash[0].pb_head) > + if (fhcache_hash[0].pb_head) > return 0; > - nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE); > + nperbucket = DIV_ROUND_UP(cache_size, FHPARM_HASH_SIZE); > if (nperbucket < 2) > nperbucket = 2; > - cache_size = nperbucket * RAPARM_HASH_SIZE; > + cache_size = nperbucket * FHPARM_HASH_SIZE; > > - dprintk("nfsd: allocating %d readahead buffers.\n", cache_size); > + dprintk("nfsd: allocating %d file handle cache buffers.\n", cache_size); > > - for (i = 0; i < RAPARM_HASH_SIZE; i++) { > - spin_lock_init(&raparm_hash[i].pb_lock); > + for (i = 0; i < FHPARM_HASH_SIZE; i++) { > + spin_lock_init(&fhcache_hash[i].pb_lock); > > - raparm = &raparm_hash[i].pb_head; > + fhcache = &fhcache_hash[i].pb_head; > for (j = 0; j < nperbucket; j++) { > - *raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL); > - if (!*raparm) > + *fhcache = kzalloc(sizeof(struct fhcache), GFP_KERNEL); > + if (!*fhcache) { > + dprintk("nfsd: kmalloc failed, freeing file cache buffers\n"); > goto out_nomem; > - raparm = &(*raparm)->p_next; > + } > + INIT_LIST_HEAD(&(*fhcache)->p_list); > + fhcache = &(*fhcache)->p_next; > } > - *raparm = NULL; > + *fhcache = NULL; > } > > nfsdstats.ra_size = cache_size; > + > + INIT_LIST_HEAD(&nfsd_daemon_list); > + k_nfsd_task = kthread_run(k_nfsd_thread, NULL, "nfsd_cacher"); > + > + if (IS_ERR(k_nfsd_task)) { > + printk(KERN_ERR "%s: unable to create kernel thread: %ld\n", > + __FUNCTION__, PTR_ERR(k_nfsd_task)); > + goto out_nomem; > + } > + > return 0; > > out_nomem: > - dprintk("nfsd: kmalloc failed, freeing readahead buffers\n"); > nfsd_racache_shutdown(); > return -ENOMEM; > } -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>]
* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle [not found] ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org> @ 2009-02-21 7:27 ` Krishna Kumar2 2009-02-21 16:35 ` Jeff Layton 0 siblings, 1 reply; 5+ messages in thread From: Krishna Kumar2 @ 2009-02-21 7:27 UTC (permalink / raw) To: Jeff Layton; +Cc: bfields, linux-nfs Hi Jeff, Thanks for your review comments. > It would be nice to break up this patchset some. It's very hard to > review large-scale changes like this, and this is quite frankly not > code that is often touched by human hands. Being able to do a git > bisect if we run across bugs after these changes would be a very nice > thing. It won't be helpful though unless the changes are in smaller > pieces. OK, I will try to break this up and send it again. > In your earlier discussion with Bruce, you mentioned trying to > determine when to flush the cache. When the exports table is changed > via exportfs, the exports kernel cache is also flushed. Hooking into > that might be the best thing... Thanks for this suggestion - I will look into how to do this. > I'd also go ahead and get rid of the ra_ prefixes unless you feel > they're needed. It'd be best to clean this up so that we don't have to > muck around in here later. OK. > > +/* Number of jiffies to cache the file before releasing */ > > +#define NFSD_CACHE_JIFFIES 100 > > It'd probably be better to express this in terms of HZ so that this is > a fixed amount of time regardless of how the kernel is compiled. Definitely it should have been in terms of HZ. > > +/* List of FH cache entries that has to be cleaned up when they expire */ > > +static struct list_head nfsd_daemon_list; > > + > > ^^^ some more descriptive variable names would be welcome... OK. > ...I can't say I'm thrilled about adding a kthread for this but don't > have any specific objections. I wonder if it might be better to just > periodically schedule (and reschedule) delayed work to the events queue > whenever a cache entry is touched? I was originally using queue_delayed_work but found the performance was better when using a daemon (in that context, I had also submitted a patch to lkml to implement a new API - queue_update_delayed_work - see: http://lwn.net/Articles/300919/). I think the problem was due to heavy contention with kernel timer locks, especially if there are a lot of parallel reads at the same time (contention for the same timer "base" lock with regular kernel timers used in other subsystems). But I will run a test with delayed work and compare with a daemon and report what difference I find. Bruce had also suggested doing a profile on new vs old kernel. Can someone tell me what to run exactly (I got the oprofile compiled in, what user commands should I run and what should I look for/report)? I would like to resubmit after getting Bruce's comments addressed, your suggestion about workqueue vs daemon and the other changes suggested. Thanks, - KK ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle 2009-02-21 7:27 ` Krishna Kumar2 @ 2009-02-21 16:35 ` Jeff Layton 0 siblings, 0 replies; 5+ messages in thread From: Jeff Layton @ 2009-02-21 16:35 UTC (permalink / raw) To: Krishna Kumar2; +Cc: bfields, linux-nfs > > ...I can't say I'm thrilled about adding a kthread for this but don't > > have any specific objections. I wonder if it might be better to just > > periodically schedule (and reschedule) delayed work to the events queue > > whenever a cache entry is touched? > > I was originally using queue_delayed_work but found the performance was > better when using a daemon (in that context, I had also submitted a patch > to lkml to implement a new API - queue_update_delayed_work - see: > http://lwn.net/Articles/300919/). I think the problem was due to heavy > contention with kernel timer locks, especially if there are a lot of > parallel reads at the same time (contention for the same timer "base" lock > with regular kernel timers used in other subsystems). > > But I will run a test with delayed work and compare with a daemon and > report > what difference I find. > > Bruce had also suggested doing a profile on new vs old kernel. Can someone > tell me what to run exactly (I got the oprofile compiled in, what user > commands > should I run and what should I look for/report)? > > I would like to resubmit after getting Bruce's comments addressed, your > suggestion about workqueue vs daemon and the other changes suggested. > It seems strange that performance would suffer from using the generic workqueue for that. The daemon in this patch is just there to clean up the cache, correct? If you have valid reasons for choosing a separate kthread to handle the cleanup then I'm ok with that. It would be a good idea to document the reasons for that design decision in the patch description so that we're aware of them. Hard numbers would also help make the case either way. As far as oprofile...it's been a while since I've used it but when I do I usually start with wcohen's page: http://people.redhat.com/wcohen/ -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-02-21 16:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-24 12:34 [RFC] [PATCH v2 0/1] nfsd: Improve NFS server performance Krishna Kumar
[not found] ` <20090124123457.10995.57636.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-01-24 12:35 ` [RFC] [PATCH v2 1/1]: nfsd: Change caching from ino/dev to file-handle Krishna Kumar
[not found] ` <20090124123511.10995.88449.sendpatchset-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-02-20 23:49 ` Jeff Layton
[not found] ` <20090220154903.1e0c6952-PC62bkCOHzGdMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2009-02-21 7:27 ` Krishna Kumar2
2009-02-21 16:35 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox