linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10
@ 2013-03-19 13:11 Jeff Layton
  2013-03-19 13:11 ` [PATCH 1/6] nfsd: eliminate one of the DRC cache searches Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

This patchset replaces the last one I sent that starts with:

    [PATCH v2 0/5] nfsd: add a reply_cache_stats file to nfsd for tracking DRC performance

I've done quite a bit of testing over the last month or so (with much
help from the RH QA folks), and the changes here are largely due to the
results of that testing.

The main changes are:

1/ I've dropped the patch to track the max and average time to search the
   cache. What I found was that occasionally the search gets preempted
   to service an IRQ or bottom half handler, and that would blow the
   stats out of the water. Since they aren't reliably indicative, it's
   probably best not to track that for now.

   In some of the testing I did, I left this in place and disabled IRQs
   during the search. That's not a suitable change for upstream but it
   did help prove the efficacy of some of the later changes.

2/ I've added a patch to scale out the number of hash buckets with the
   max size of the cache. This seems to greatly improve performance by
   simply keeping the hash chain lengths low. I saw about 5-6 fold
   decrease in the average search time with this change. The downside
   of course is that we have to allocate that many more buckets up
   front, but I don't think that's entirely unreasonable.

These patches represent the DRC-related changes that I'd like to see
merged in 3.10, possibly sooner if we get any reports of performance
regressions in the DRC due to the changes that went into 3.9.

Jeff Layton (6):
  nfsd: eliminate one of the DRC cache searches
  nfsd: break out comparator into separate function
  nfsd: track memory utilization by the DRC
  nfsd: add new reply_cache_stats file in nfsdfs
  nfsd: keep stats on worst hash balancing seen so far
  nfsd: scale up the number of DRC hash buckets with cache size

 fs/nfsd/cache.h    |   1 +
 fs/nfsd/nfscache.c | 195 +++++++++++++++++++++++++++++++++++++++--------------
 fs/nfsd/nfsctl.c   |   9 +++
 3 files changed, 155 insertions(+), 50 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] nfsd: eliminate one of the DRC cache searches
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  2013-03-19 13:11 ` [PATCH 2/6] nfsd: break out comparator into separate function Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The most common case is to do a search of the cache, followed by an
insert. In the case where we have to allocate an entry off the slab,
then we end up having to redo the search, which is wasteful.

Better optimize the code for the common case by eliminating the initial
search of the cache and always preallocating an entry. In the case of a
cache hit, we'll end up just freeing that entry but that's preferable to
an extra search.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 41 +++++++++++++++++++----------------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index ca05f6d..c61391e 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -318,55 +318,53 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
 	__wsum			csum;
 	unsigned long		age;
 	int type = rqstp->rq_cachetype;
-	int rtn;
+	int rtn = RC_DOIT;
 
 	rqstp->rq_cacherep = NULL;
 	if (type == RC_NOCACHE) {
 		nfsdstats.rcnocache++;
-		return RC_DOIT;
+		return rtn;
 	}
 
 	csum = nfsd_cache_csum(rqstp);
 
+	/*
+	 * Since the common case is a cache miss followed by an insert,
+	 * preallocate an entry. First, try to reuse the first entry on the LRU
+	 * if it works, then go ahead and prune the LRU list.
+	 */
 	spin_lock(&cache_lock);
-	rtn = RC_DOIT;
-
-	rp = nfsd_cache_search(rqstp, csum);
-	if (rp)
-		goto found_entry;
-
-	/* Try to use the first entry on the LRU */
 	if (!list_empty(&lru_head)) {
 		rp = list_first_entry(&lru_head, struct svc_cacherep, c_lru);
 		if (nfsd_cache_entry_expired(rp) ||
 		    num_drc_entries >= max_drc_entries) {
 			lru_put_end(rp);
 			prune_cache_entries();
-			goto setup_entry;
+			goto search_cache;
 		}
 	}
 
-	/* Drop the lock and allocate a new entry */
+	/* No expired ones available, allocate a new one. */
 	spin_unlock(&cache_lock);
 	rp = nfsd_reply_cache_alloc();
-	if (!rp) {
-		dprintk("nfsd: unable to allocate DRC entry!\n");
-		return RC_DOIT;
-	}
 	spin_lock(&cache_lock);
-	++num_drc_entries;
+	if (likely(rp))
+		++num_drc_entries;
 
-	/*
-	 * Must search again just in case someone inserted one
-	 * after we dropped the lock above.
-	 */
+search_cache:
 	found = nfsd_cache_search(rqstp, csum);
 	if (found) {
-		nfsd_reply_cache_free_locked(rp);
+		if (likely(rp))
+			nfsd_reply_cache_free_locked(rp);
 		rp = found;
 		goto found_entry;
 	}
 
+	if (!rp) {
+		dprintk("nfsd: unable to allocate DRC entry!\n");
+		goto out;
+	}
+
 	/*
 	 * We're keeping the one we just allocated. Are we now over the
 	 * limit? Prune one off the tip of the LRU in trade for the one we
@@ -376,7 +374,6 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
 		nfsd_reply_cache_free_locked(list_first_entry(&lru_head,
 						struct svc_cacherep, c_lru));
 
-setup_entry:
 	nfsdstats.rcmisses++;
 	rqstp->rq_cacherep = rp;
 	rp->c_state = RC_INPROG;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/6] nfsd: break out comparator into separate function
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
  2013-03-19 13:11 ` [PATCH 1/6] nfsd: eliminate one of the DRC cache searches Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  2013-03-19 13:11 ` [PATCH 3/6] nfsd: track memory utilization by the DRC Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Break out the function that compares the rqstp and checksum against a
reply cache entry. While we're at it, track the efficacy of the checksum
over the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 46 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index c61391e..48f5ef9 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -23,10 +23,22 @@
 static struct hlist_head *	cache_hash;
 static struct list_head 	lru_head;
 static struct kmem_cache	*drc_slab;
-static unsigned int		num_drc_entries;
+
+/* max number of entries allowed in the cache */
 static unsigned int		max_drc_entries;
 
 /*
+ * Stats and other tracking of on the duplicate reply cache. All of these and
+ * the "rc" fields in nfsdstats are protected by the cache_lock
+ */
+
+/* total number of entries */
+static unsigned int		num_drc_entries;
+
+/* cache misses due only to checksum comparison failures */
+static unsigned int		payload_misses;
+
+/*
  * Calculate the hash index from an XID.
  */
 static inline u32 request_hash(u32 xid)
@@ -273,6 +285,26 @@ nfsd_cache_csum(struct svc_rqst *rqstp)
 	return csum;
 }
 
+static bool
+nfsd_cache_match(struct svc_rqst *rqstp, __wsum csum, struct svc_cacherep *rp)
+{
+	/* Check RPC header info first */
+	if (rqstp->rq_xid != rp->c_xid || rqstp->rq_proc != rp->c_proc ||
+	    rqstp->rq_prot != rp->c_prot || rqstp->rq_vers != rp->c_vers ||
+	    rqstp->rq_arg.len != rp->c_len ||
+	    !rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) ||
+	    rpc_get_port(svc_addr(rqstp)) != rpc_get_port((struct sockaddr *)&rp->c_addr))
+		return false;
+
+	/* compare checksum of NFS data */
+	if (csum != rp->c_csum) {
+		++payload_misses;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * Search the request hash for an entry that matches the given rqstp.
  * Must be called with cache_lock held. Returns the found entry or
@@ -283,18 +315,10 @@ nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
 {
 	struct svc_cacherep	*rp;
 	struct hlist_head 	*rh;
-	__be32			xid = rqstp->rq_xid;
-	u32			proto =  rqstp->rq_prot,
-				vers = rqstp->rq_vers,
-				proc = rqstp->rq_proc;
 
-	rh = &cache_hash[request_hash(xid)];
+	rh = &cache_hash[request_hash(rqstp->rq_xid)];
 	hlist_for_each_entry(rp, rh, c_hash) {
-		if (xid == rp->c_xid && proc == rp->c_proc &&
-		    proto == rp->c_prot && vers == rp->c_vers &&
-		    rqstp->rq_arg.len == rp->c_len && csum == rp->c_csum &&
-		    rpc_cmp_addr(svc_addr(rqstp), (struct sockaddr *)&rp->c_addr) &&
-		    rpc_get_port(svc_addr(rqstp)) == rpc_get_port((struct sockaddr *)&rp->c_addr))
+		if (nfsd_cache_match(rqstp, csum, rp))
 			return rp;
 	}
 	return NULL;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/6] nfsd: track memory utilization by the DRC
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
  2013-03-19 13:11 ` [PATCH 1/6] nfsd: eliminate one of the DRC cache searches Jeff Layton
  2013-03-19 13:11 ` [PATCH 2/6] nfsd: break out comparator into separate function Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  2013-03-19 13:11 ` [PATCH 4/6] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 48f5ef9..1f45b33 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -38,6 +38,9 @@ static unsigned int		num_drc_entries;
 /* cache misses due only to checksum comparison failures */
 static unsigned int		payload_misses;
 
+/* amount of memory (in bytes) currently consumed by the DRC */
+static unsigned int		drc_mem_usage;
+
 /*
  * Calculate the hash index from an XID.
  */
@@ -112,12 +115,15 @@ nfsd_reply_cache_alloc(void)
 static void
 nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
 {
-	if (rp->c_type == RC_REPLBUFF)
+	if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
+		drc_mem_usage -= rp->c_replvec.iov_len;
 		kfree(rp->c_replvec.iov_base);
+	}
 	if (!hlist_unhashed(&rp->c_hash))
 		hlist_del(&rp->c_hash);
 	list_del(&rp->c_lru);
 	--num_drc_entries;
+	drc_mem_usage -= sizeof(*rp);
 	kmem_cache_free(drc_slab, rp);
 }
 
@@ -372,8 +378,10 @@ nfsd_cache_lookup(struct svc_rqst *rqstp)
 	spin_unlock(&cache_lock);
 	rp = nfsd_reply_cache_alloc();
 	spin_lock(&cache_lock);
-	if (likely(rp))
+	if (likely(rp)) {
 		++num_drc_entries;
+		drc_mem_usage += sizeof(*rp);
+	}
 
 search_cache:
 	found = nfsd_cache_search(rqstp, csum);
@@ -415,6 +423,7 @@ search_cache:
 
 	/* release any buffer */
 	if (rp->c_type == RC_REPLBUFF) {
+		drc_mem_usage -= rp->c_replvec.iov_len;
 		kfree(rp->c_replvec.iov_base);
 		rp->c_replvec.iov_base = NULL;
 	}
@@ -483,6 +492,7 @@ nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp)
 	struct svc_cacherep *rp = rqstp->rq_cacherep;
 	struct kvec	*resv = &rqstp->rq_res.head[0], *cachv;
 	int		len;
+	size_t		bufsize = 0;
 
 	if (!rp)
 		return;
@@ -504,19 +514,21 @@ nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp)
 		break;
 	case RC_REPLBUFF:
 		cachv = &rp->c_replvec;
-		cachv->iov_base = kmalloc(len << 2, GFP_KERNEL);
+		bufsize = len << 2;
+		cachv->iov_base = kmalloc(bufsize, GFP_KERNEL);
 		if (!cachv->iov_base) {
 			nfsd_reply_cache_free(rp);
 			return;
 		}
-		cachv->iov_len = len << 2;
-		memcpy(cachv->iov_base, statp, len << 2);
+		cachv->iov_len = bufsize;
+		memcpy(cachv->iov_base, statp, bufsize);
 		break;
 	case RC_NOCACHE:
 		nfsd_reply_cache_free(rp);
 		return;
 	}
 	spin_lock(&cache_lock);
+	drc_mem_usage += bufsize;
 	lru_put_end(rp);
 	rp->c_secure = rqstp->rq_secure;
 	rp->c_type = cachetype;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/6] nfsd: add new reply_cache_stats file in nfsdfs
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
                   ` (2 preceding siblings ...)
  2013-03-19 13:11 ` [PATCH 3/6] nfsd: track memory utilization by the DRC Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  2013-03-19 13:11 ` [PATCH 5/6] nfsd: keep stats on worst hash balancing seen so far Jeff Layton
  2013-03-19 13:11 ` [PATCH 6/6] nfsd: scale up the number of DRC hash buckets with cache size Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

For presenting statistics relating to duplicate reply cache.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/cache.h    |  1 +
 fs/nfsd/nfscache.c | 25 +++++++++++++++++++++++++
 fs/nfsd/nfsctl.c   |  9 +++++++++
 3 files changed, 35 insertions(+)

diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h
index 87fd141..d5c5b3e 100644
--- a/fs/nfsd/cache.h
+++ b/fs/nfsd/cache.h
@@ -82,6 +82,7 @@ int	nfsd_reply_cache_init(void);
 void	nfsd_reply_cache_shutdown(void);
 int	nfsd_cache_lookup(struct svc_rqst *);
 void	nfsd_cache_update(struct svc_rqst *, int, __be32 *);
+int	nfsd_reply_cache_stats_open(struct inode *, struct file *);
 
 #ifdef CONFIG_NFSD_V4
 void	nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp);
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 1f45b33..fd81ca7 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -556,3 +556,28 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
 	vec->iov_len += data->iov_len;
 	return 1;
 }
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
+{
+	spin_lock(&cache_lock);
+	seq_printf(m, "max entries:           %u\n", max_drc_entries);
+	seq_printf(m, "num entries:           %u\n", num_drc_entries);
+	seq_printf(m, "hash buckets:          %u\n", HASHSIZE);
+	seq_printf(m, "mem usage:             %u\n", drc_mem_usage);
+	seq_printf(m, "cache hits:            %u\n", nfsdstats.rchits);
+	seq_printf(m, "cache misses:          %u\n", nfsdstats.rcmisses);
+	seq_printf(m, "not cached:            %u\n", nfsdstats.rcnocache);
+	seq_printf(m, "payload misses:        %u\n", payload_misses);
+	spin_unlock(&cache_lock);
+	return 0;
+}
+
+int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, nfsd_reply_cache_stats_show, NULL);
+}
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index f33455b..a830f33 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -35,6 +35,7 @@ enum {
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Pool_Stats,
+	NFSD_Reply_Cache_Stats,
 	NFSD_Versions,
 	NFSD_Ports,
 	NFSD_MaxBlkSize,
@@ -212,6 +213,13 @@ static const struct file_operations pool_stats_operations = {
 	.owner		= THIS_MODULE,
 };
 
+static struct file_operations reply_cache_stats_operations = {
+	.open		= nfsd_reply_cache_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 /*----------------------------------------------------------------------------*/
 /*
  * payload - write methods
@@ -1047,6 +1055,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
+		[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
 		[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/6] nfsd: keep stats on worst hash balancing seen so far
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
                   ` (3 preceding siblings ...)
  2013-03-19 13:11 ` [PATCH 4/6] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  2013-03-19 13:11 ` [PATCH 6/6] nfsd: scale up the number of DRC hash buckets with cache size Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.

With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index fd81ca7..17cb0d6 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -41,6 +41,12 @@ static unsigned int		payload_misses;
 /* amount of memory (in bytes) currently consumed by the DRC */
 static unsigned int		drc_mem_usage;
 
+/* longest hash chain seen */
+static unsigned int		longest_chain;
+
+/* size of cache when we saw the longest hash chain */
+static unsigned int		longest_chain_cachesize;
+
 /*
  * Calculate the hash index from an XID.
  */
@@ -319,15 +325,30 @@ nfsd_cache_match(struct svc_rqst *rqstp, __wsum csum, struct svc_cacherep *rp)
 static struct svc_cacherep *
 nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
 {
-	struct svc_cacherep	*rp;
+	struct svc_cacherep	*rp, *ret = NULL;
 	struct hlist_head 	*rh;
+	unsigned int		entries = 0;
 
 	rh = &cache_hash[request_hash(rqstp->rq_xid)];
 	hlist_for_each_entry(rp, rh, c_hash) {
-		if (nfsd_cache_match(rqstp, csum, rp))
-			return rp;
+		++entries;
+		if (nfsd_cache_match(rqstp, csum, rp)) {
+			ret = rp;
+			break;
+		}
+	}
+
+	/* tally hash chain length stats */
+	if (entries > longest_chain) {
+		longest_chain = entries;
+		longest_chain_cachesize = num_drc_entries;
+	} else if (entries == longest_chain) {
+		/* prefer to keep the smallest cachesize possible here */
+		longest_chain_cachesize = min(longest_chain_cachesize,
+						num_drc_entries);
 	}
-	return NULL;
+
+	return ret;
 }
 
 /*
@@ -573,6 +594,8 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "cache misses:          %u\n", nfsdstats.rcmisses);
 	seq_printf(m, "not cached:            %u\n", nfsdstats.rcnocache);
 	seq_printf(m, "payload misses:        %u\n", payload_misses);
+	seq_printf(m, "longest chain len:     %u\n", longest_chain);
+	seq_printf(m, "cachesize at longest:  %u\n", longest_chain_cachesize);
 	spin_unlock(&cache_lock);
 	return 0;
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 6/6] nfsd: scale up the number of DRC hash buckets with cache size
  2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
                   ` (4 preceding siblings ...)
  2013-03-19 13:11 ` [PATCH 5/6] nfsd: keep stats on worst hash balancing seen so far Jeff Layton
@ 2013-03-19 13:11 ` Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2013-03-19 13:11 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.

At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:

The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:

    old hash: 68/(39258/2048) == 3.547404
    hash_32:  45/(33773/2048) == 2.728807

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 44 +++++++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 17cb0d6..eb25877 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -11,6 +11,8 @@
 #include <linux/slab.h>
 #include <linux/sunrpc/addr.h>
 #include <linux/highmem.h>
+#include <linux/log2.h>
+#include <linux/hash.h>
 #include <net/checksum.h>
 
 #include "nfsd.h"
@@ -18,7 +20,12 @@
 
 #define NFSDDBG_FACILITY	NFSDDBG_REPCACHE
 
-#define HASHSIZE		64
+/*
+ * We use this value to determine the number of hash buckets from the max
+ * cache size, the idea being that when the cache is at its maximum number
+ * of entries, then this should be the average number of entries per bucket.
+ */
+#define TARGET_BUCKET_SIZE	64
 
 static struct hlist_head *	cache_hash;
 static struct list_head 	lru_head;
@@ -27,6 +34,9 @@ static struct kmem_cache	*drc_slab;
 /* max number of entries allowed in the cache */
 static unsigned int		max_drc_entries;
 
+/* number of significant bits in the hash value */
+static unsigned int		maskbits;
+
 /*
  * Stats and other tracking of on the duplicate reply cache. All of these and
  * the "rc" fields in nfsdstats are protected by the cache_lock
@@ -47,16 +57,6 @@ static unsigned int		longest_chain;
 /* size of cache when we saw the longest hash chain */
 static unsigned int		longest_chain_cachesize;
 
-/*
- * Calculate the hash index from an XID.
- */
-static inline u32 request_hash(u32 xid)
-{
-	u32 h = xid;
-	h ^= (xid >> 24);
-	return h & (HASHSIZE-1);
-}
-
 static int	nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *vec);
 static void	cache_cleaner_func(struct work_struct *unused);
 static int 	nfsd_reply_cache_shrink(struct shrinker *shrink,
@@ -103,6 +103,16 @@ nfsd_cache_size_limit(void)
 	return min_t(unsigned int, limit, 256*1024);
 }
 
+/*
+ * Compute the number of hash buckets we need. Divide the max cachesize by
+ * the "target" max bucket size, and round up to next power of two.
+ */
+static unsigned int
+nfsd_hashsize(unsigned int limit)
+{
+	return roundup_pow_of_two(limit / TARGET_BUCKET_SIZE);
+}
+
 static struct svc_cacherep *
 nfsd_reply_cache_alloc(void)
 {
@@ -143,9 +153,13 @@ nfsd_reply_cache_free(struct svc_cacherep *rp)
 
 int nfsd_reply_cache_init(void)
 {
+	unsigned int hashsize;
+
 	INIT_LIST_HEAD(&lru_head);
 	max_drc_entries = nfsd_cache_size_limit();
 	num_drc_entries = 0;
+	hashsize = nfsd_hashsize(max_drc_entries);
+	maskbits = ilog2(hashsize);
 
 	register_shrinker(&nfsd_reply_cache_shrinker);
 	drc_slab = kmem_cache_create("nfsd_drc", sizeof(struct svc_cacherep),
@@ -153,7 +167,7 @@ int nfsd_reply_cache_init(void)
 	if (!drc_slab)
 		goto out_nomem;
 
-	cache_hash = kcalloc(HASHSIZE, sizeof(struct hlist_head), GFP_KERNEL);
+	cache_hash = kcalloc(hashsize, sizeof(struct hlist_head), GFP_KERNEL);
 	if (!cache_hash)
 		goto out_nomem;
 
@@ -204,7 +218,7 @@ static void
 hash_refile(struct svc_cacherep *rp)
 {
 	hlist_del_init(&rp->c_hash);
-	hlist_add_head(&rp->c_hash, cache_hash + request_hash(rp->c_xid));
+	hlist_add_head(&rp->c_hash, cache_hash + hash_32(rp->c_xid, maskbits));
 }
 
 static inline bool
@@ -329,7 +343,7 @@ nfsd_cache_search(struct svc_rqst *rqstp, __wsum csum)
 	struct hlist_head 	*rh;
 	unsigned int		entries = 0;
 
-	rh = &cache_hash[request_hash(rqstp->rq_xid)];
+	rh = &cache_hash[hash_32(rqstp->rq_xid, maskbits)];
 	hlist_for_each_entry(rp, rh, c_hash) {
 		++entries;
 		if (nfsd_cache_match(rqstp, csum, rp)) {
@@ -588,7 +602,7 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 	spin_lock(&cache_lock);
 	seq_printf(m, "max entries:           %u\n", max_drc_entries);
 	seq_printf(m, "num entries:           %u\n", num_drc_entries);
-	seq_printf(m, "hash buckets:          %u\n", HASHSIZE);
+	seq_printf(m, "hash buckets:          %u\n", 1 << maskbits);
 	seq_printf(m, "mem usage:             %u\n", drc_mem_usage);
 	seq_printf(m, "cache hits:            %u\n", nfsdstats.rchits);
 	seq_printf(m, "cache misses:          %u\n", nfsdstats.rcmisses);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-19 13:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-19 13:11 [PATCH 0/6] nfsd: duplicate reply cache improvements for 3.10 Jeff Layton
2013-03-19 13:11 ` [PATCH 1/6] nfsd: eliminate one of the DRC cache searches Jeff Layton
2013-03-19 13:11 ` [PATCH 2/6] nfsd: break out comparator into separate function Jeff Layton
2013-03-19 13:11 ` [PATCH 3/6] nfsd: track memory utilization by the DRC Jeff Layton
2013-03-19 13:11 ` [PATCH 4/6] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
2013-03-19 13:11 ` [PATCH 5/6] nfsd: keep stats on worst hash balancing seen so far Jeff Layton
2013-03-19 13:11 ` [PATCH 6/6] nfsd: scale up the number of DRC hash buckets with cache size Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).