Re: Need help with NFS Server SUNRPC performance issue

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Shyam Kaushik <shyamnfs1@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Need help with NFS Server SUNRPC performance issue
Date: Wed, 13 Nov 2013 17:00:36 -0500	[thread overview]
Message-ID: <20131113220036.GO28033@fieldses.org> (raw)
In-Reply-To: <20131113162443.GK28033@fieldses.org>

On Wed, Nov 13, 2013 at 11:24:44AM -0500, J. Bruce Fields wrote:
> On Wed, Nov 06, 2013 at 12:57:38PM +0530, Shyam Kaushik wrote:
> > Hi Bruce,
> > 
> > This hack works great. All 32 of configured NFSD threads end up doing
> > nfsd_write() which is great & I get higher IOPs/bandwidth from NFS
> > client side.
> > 
> > What do you think if we vary the signature of
> > typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, void *);
> > 
> > to include an additional return argument of the size estimate. This
> > way we get size estimate from the decoders (like nfsd4_decode_read
> > could return this estimate as rd_length + overhead) & in the worst
> > case if decoder says cant estimate (like a special return code -1) we
> > dont do svc_reserve() & leave it like it is. So when we run through
> > the compound we have a sum of size estimate & just do svc_reserve() at
> > the end of nfsd4_decode_compound() like your hack has.
> > 
> > Does this sound reasonable to you? If not, perhaps can we just use the
> > hack for now & worry about readdir etc when they support >4K buffer?
> 
> Yep.  Actually looking at it again I think it needs a couple more
> special cases (for readlink, readdir), but that should be good enough
> for now.

So I'm planning to commit the following.

But eventually I agree we'd rather do the calculation in the decoder.
(Which would make it easier for example to take into account whether a
getattr op includes a request for an ACL.)

--b.

commit 6ff40decff0ef35a5d755ec60182d7f803356dfb
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Nov 5 15:07:16 2013 -0500

    nfsd4: improve write performance with better sendspace reservations
    
    Currently the rpc code conservatively refuses to accept rpc's from a
    client if the sum of its worst-case estimates of the replies it owes
    that client exceed the send buffer space.
    
    Unfortunately our estimate of the worst-case reply for an NFSv4 compound
    is always the maximum read size.  This can unnecessarily limit the
    number of operations we handle concurrently, for example in the case
    most operations are writes (which have small replies).
    
    We can do a little better if we check which ops the compound contains.
    
    This is still a rough estimate, we'll need to improve on it some day.
    
    Reported-by: Shyam Kaushik <shyamnfs1@gmail.com>
    Tested-by: Shyam Kaushik <shyamnfs1@gmail.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d9d7fa9..9d76ee3 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1597,12 +1597,39 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op)
 	return true;
 }
 
+/*
+ * Return a rough estimate of the maximum possible reply size.  Note the
+ * estimate includes rpc headers so is meant to be passed to
+ * svc_reserve, not svc_reserve_auth.
+ *
+ * Also note the current compound encoding permits only one operation to
+ * use pages beyond the first one, so the maximum possible length is the
+ * maximum over these values, not the sum.
+ */
+static int nfsd4_max_reply(u32 opnum)
+{
+	switch (opnum) {
+	case OP_READLINK:
+	case OP_READDIR:
+		/*
+		 * Both of these ops take a single page for data and put
+		 * the head and tail in another page:
+		 */
+		return 2 * PAGE_SIZE;
+	case OP_READ:
+		return INT_MAX;
+	default:
+		return PAGE_SIZE;
+	}
+}
+
 static __be32
 nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
 {
 	DECODE_HEAD;
 	struct nfsd4_op *op;
 	bool cachethis = false;
+	int max_reply = PAGE_SIZE;
 	int i;
 
 	READ_BUF(4);
@@ -1652,10 +1679,14 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
 		 * op in the compound wants to be cached:
 		 */
 		cachethis |= nfsd4_cache_this_op(op);
+
+		max_reply = max(max_reply, nfsd4_max_reply(op->opnum));
 	}
 	/* Sessions make the DRC unnecessary: */
 	if (argp->minorversion)
 		cachethis = false;
+	if (max_reply != INT_MAX)
+		svc_reserve(argp->rqstp, max_reply);
 	argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;
 
 	DECODE_TAIL;

next prev parent reply	other threads:[~2013-11-13 22:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-31  6:49 Need help with NFS Server SUNRPC performance issue Shyam Kaushik
2013-10-31 14:15 ` J. Bruce Fields
2013-10-31 14:45   ` Michael Richardson
2013-10-31 15:14     ` J. Bruce Fields
2013-11-01 19:09       ` Michael Richardson
2013-11-04 23:03         ` J. Bruce Fields
2013-11-01  4:43     ` Shyam Kaushik
2013-11-13  4:07     ` Shyam Kaushik
2013-11-13 16:18       ` Bruce Fields
2013-11-01  4:38   ` Shyam Kaushik
2013-11-04 23:02     ` J. Bruce Fields
2013-11-05 13:44       ` Shyam Kaushik
2013-11-05 19:58         ` J. Bruce Fields
2013-11-06  7:27           ` Shyam Kaushik
2013-11-13 16:24             ` J. Bruce Fields
2013-11-13 22:00               ` J. Bruce Fields [this message]
2013-11-14  4:23                 ` Shyam Kaushik

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d9d7fa9 dfblob:9d76ee3 )
 OR (
bs:"Re: Need help with NFS Server SUNRPC performance issue" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113220036.GO28033@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=shyamnfs1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).