All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Shyam Kaushik <shyamnfs1@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Need help with NFS Server SUNRPC performance issue
Date: Mon, 4 Nov 2013 18:02:44 -0500	[thread overview]
Message-ID: <20131104230244.GD8828@fieldses.org> (raw)
In-Reply-To: <CA+uAZNO7VrBA1MLgGoqGpXGPHMMy0VF_3KBr93Y5w1M=ZO7s4w@mail.gmail.com>

On Fri, Nov 01, 2013 at 10:08:18AM +0530, Shyam Kaushik wrote:
> Hi Bruce,
> 
> Yes I am using NFSv4. I am willing to test any kernel/patches that you
> suggest. Please let me know where we can start. Also I have
> sunrpc/nfsd/lockd etc compiled as modules & can readily debug it as
> needed.

OK, thanks.  It would be worth trying to implement the comment at the
top of fs/nfsd/nfs4xdr.c:

	 * TODO: Neil Brown made the following observation:  We
	 * currently initially reserve NFSD_BUFSIZE space on the
	 * transmit queue and never release any of that until the
	 * request is complete.  It would be good to calculate a new
	 * maximum response size while decoding the COMPOUND, and call
	 * svc_reserve with this number at the end of
	 * nfs4svc_decode_compoundargs.

I think it shouldn't be too difficult--we just need to work out some
upper bounds on the reply size per operation.

A first idea approximation just to test the idea might be just to call
svc_reserve(., 4096) on any compound not containing a read.

--b.

> 
> I digged this a bit further & I think you are on dot that the issue is
> with rcp layer + buffer space. From tcpdump I see that the initial
> request comes from client to server according to the number of
> outstanding IOs that fio initiates, but then there are multiple back &
> forth packets (RPC continuation & acks) that is slowing up things. I
> thought waking up the NFSD threads that are sleeping within
> svc_get_next_xprt() was an issue initially & made the
> schedule_timeout() with a smaller timeout, but then all the threads
> wakeup & saw there was no work enqueued & went back to sleep again. So
> from sunrpc server standpoint enqueue() is not happening as it should
> be.
> 
> In the meantime from NFS client side I see a single rpc thread thats
> working all the time.
> 
> Thanks.
> 
> --Shyam
> 
> 
> 
> On Thu, Oct 31, 2013 at 7:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Oct 31, 2013 at 12:19:01PM +0530, Shyam Kaushik wrote:
> >> Hi Folks,
> >>
> >> I am chasing a NFS server performance issue on Ubuntu
> >> 3.8.13-030813-generic kernel. We setup 32 NFSD threads on our NFS
> >> server.
> >>
> >> The issue is:
> >> # I am using fio to generate 4K random writes (over a sync mounted NFS
> >> server filesystem) with 64 outstanding IOs per job for 10 jobs. fio
> >> direct flag is set.
> >> # When doing fio randwrite 4K IOs, realized that we cannot exceed 2.5K
> >> IOPs on the NFS server from a single client.
> >> # With multiple clients we can do more IOPs (like 3x more IOPs with 3 clients)
> >> # Further chasing the issue, I realized that at any point in time only
> >> 8 NFSD threads are active doing vfs_wrte(). Remaining 24 threads are
> >> sleeping within svc_recv()/svc_get_next_xprt().
> >> # First I thought its TCP socket contention/sleeping at the wrong
> >> time. I introduced a one-sec sleep around vfs_write() within NFSD
> >> using msleep(). With this I can clearly see that only 8 NFSD threads
> >> are active doing the write+sleep loop while all other threads are
> >> sleeping.
> >> # I enabled rpcdebug/nfs debug on NFS client side + used tcpdump on
> >> NFS server side to confirm that client is queuing all the outstanding
> >> IOs concurrently & its not a NFS client side problem.
> >>
> >> Now the question is what is holding up the sunrpc layer to do only 8
> >> outstanding IOs? Is there some TCP level buffer size limitation or so
> >> that is causing this issue? I also added counters around which all
> >> nfsd threads get to process the SVC xport & I see always only the
> >> first 10 threads being used up all the time. The rest of the NFSD
> >> threads never receive a packet at all to handle.
> >>
> >> I already setup number of RPC slots tuneable to 128 on both server &
> >> client before the mount, so this is not the issue.
> >>
> >> Are there some other tuneables that control this behaviour? I think if
> >> I cross the 8 concurrent IOs per client<>server, I will be able to get
> >> >2.5K IOPs.
> >>
> >> I also confirmed that each NFS multi-step operation that comes from
> >> client has an OP_PUTFH/OP_WRITE/OP_GETATTR. I dont see any other
> >> unnecessary NFS packets in the flow.
> >>
> >> Any help/inputs on this topic greatly appreciated.
> >
> > There's some logic in the rpc layer that tries not to accept requests
> > unless there's adequate send buffer space for the worst case reply.  It
> > could be that logic interfering.....  I'm not sure how to test that
> > quickly.
> >
> > Would you be willing to test an upstream kernel and/or some patches?
> >
> > Sounds like you're using only NFSv4?
> >
> > --b.

  reply	other threads:[~2013-11-04 23:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-31  6:49 Need help with NFS Server SUNRPC performance issue Shyam Kaushik
2013-10-31 14:15 ` J. Bruce Fields
2013-10-31 14:45   ` Michael Richardson
2013-10-31 15:14     ` J. Bruce Fields
2013-11-01 19:09       ` Michael Richardson
2013-11-04 23:03         ` J. Bruce Fields
2013-11-01  4:43     ` Shyam Kaushik
2013-11-13  4:07     ` Shyam Kaushik
2013-11-13 16:18       ` Bruce Fields
2013-11-01  4:38   ` Shyam Kaushik
2013-11-04 23:02     ` J. Bruce Fields [this message]
2013-11-05 13:44       ` Shyam Kaushik
2013-11-05 19:58         ` J. Bruce Fields
2013-11-06  7:27           ` Shyam Kaushik
2013-11-13 16:24             ` J. Bruce Fields
2013-11-13 22:00               ` J. Bruce Fields
2013-11-14  4:23                 ` Shyam Kaushik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131104230244.GD8828@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=shyamnfs1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.