From: "J. Bruce Fields" <bfields@fieldses.org>
To: Shyam Kaushik <shyamnfs1@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Need help with NFS Server SUNRPC performance issue
Date: Mon, 4 Nov 2013 18:02:44 -0500 [thread overview]
Message-ID: <20131104230244.GD8828@fieldses.org> (raw)
In-Reply-To: <CA+uAZNO7VrBA1MLgGoqGpXGPHMMy0VF_3KBr93Y5w1M=ZO7s4w@mail.gmail.com>
On Fri, Nov 01, 2013 at 10:08:18AM +0530, Shyam Kaushik wrote:
> Hi Bruce,
>
> Yes I am using NFSv4. I am willing to test any kernel/patches that you
> suggest. Please let me know where we can start. Also I have
> sunrpc/nfsd/lockd etc compiled as modules & can readily debug it as
> needed.
OK, thanks. It would be worth trying to implement the comment at the
top of fs/nfsd/nfs4xdr.c:
* TODO: Neil Brown made the following observation: We
* currently initially reserve NFSD_BUFSIZE space on the
* transmit queue and never release any of that until the
* request is complete. It would be good to calculate a new
* maximum response size while decoding the COMPOUND, and call
* svc_reserve with this number at the end of
* nfs4svc_decode_compoundargs.
I think it shouldn't be too difficult--we just need to work out some
upper bounds on the reply size per operation.
A first idea approximation just to test the idea might be just to call
svc_reserve(., 4096) on any compound not containing a read.
--b.
>
> I digged this a bit further & I think you are on dot that the issue is
> with rcp layer + buffer space. From tcpdump I see that the initial
> request comes from client to server according to the number of
> outstanding IOs that fio initiates, but then there are multiple back &
> forth packets (RPC continuation & acks) that is slowing up things. I
> thought waking up the NFSD threads that are sleeping within
> svc_get_next_xprt() was an issue initially & made the
> schedule_timeout() with a smaller timeout, but then all the threads
> wakeup & saw there was no work enqueued & went back to sleep again. So
> from sunrpc server standpoint enqueue() is not happening as it should
> be.
>
> In the meantime from NFS client side I see a single rpc thread thats
> working all the time.
>
> Thanks.
>
> --Shyam
>
>
>
> On Thu, Oct 31, 2013 at 7:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Oct 31, 2013 at 12:19:01PM +0530, Shyam Kaushik wrote:
> >> Hi Folks,
> >>
> >> I am chasing a NFS server performance issue on Ubuntu
> >> 3.8.13-030813-generic kernel. We setup 32 NFSD threads on our NFS
> >> server.
> >>
> >> The issue is:
> >> # I am using fio to generate 4K random writes (over a sync mounted NFS
> >> server filesystem) with 64 outstanding IOs per job for 10 jobs. fio
> >> direct flag is set.
> >> # When doing fio randwrite 4K IOs, realized that we cannot exceed 2.5K
> >> IOPs on the NFS server from a single client.
> >> # With multiple clients we can do more IOPs (like 3x more IOPs with 3 clients)
> >> # Further chasing the issue, I realized that at any point in time only
> >> 8 NFSD threads are active doing vfs_wrte(). Remaining 24 threads are
> >> sleeping within svc_recv()/svc_get_next_xprt().
> >> # First I thought its TCP socket contention/sleeping at the wrong
> >> time. I introduced a one-sec sleep around vfs_write() within NFSD
> >> using msleep(). With this I can clearly see that only 8 NFSD threads
> >> are active doing the write+sleep loop while all other threads are
> >> sleeping.
> >> # I enabled rpcdebug/nfs debug on NFS client side + used tcpdump on
> >> NFS server side to confirm that client is queuing all the outstanding
> >> IOs concurrently & its not a NFS client side problem.
> >>
> >> Now the question is what is holding up the sunrpc layer to do only 8
> >> outstanding IOs? Is there some TCP level buffer size limitation or so
> >> that is causing this issue? I also added counters around which all
> >> nfsd threads get to process the SVC xport & I see always only the
> >> first 10 threads being used up all the time. The rest of the NFSD
> >> threads never receive a packet at all to handle.
> >>
> >> I already setup number of RPC slots tuneable to 128 on both server &
> >> client before the mount, so this is not the issue.
> >>
> >> Are there some other tuneables that control this behaviour? I think if
> >> I cross the 8 concurrent IOs per client<>server, I will be able to get
> >> >2.5K IOPs.
> >>
> >> I also confirmed that each NFS multi-step operation that comes from
> >> client has an OP_PUTFH/OP_WRITE/OP_GETATTR. I dont see any other
> >> unnecessary NFS packets in the flow.
> >>
> >> Any help/inputs on this topic greatly appreciated.
> >
> > There's some logic in the rpc layer that tries not to accept requests
> > unless there's adequate send buffer space for the worst case reply. It
> > could be that logic interfering..... I'm not sure how to test that
> > quickly.
> >
> > Would you be willing to test an upstream kernel and/or some patches?
> >
> > Sounds like you're using only NFSv4?
> >
> > --b.
next prev parent reply other threads:[~2013-11-04 23:02 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-31 6:49 Need help with NFS Server SUNRPC performance issue Shyam Kaushik
2013-10-31 14:15 ` J. Bruce Fields
2013-10-31 14:45 ` Michael Richardson
2013-10-31 15:14 ` J. Bruce Fields
2013-11-01 19:09 ` Michael Richardson
2013-11-04 23:03 ` J. Bruce Fields
2013-11-01 4:43 ` Shyam Kaushik
2013-11-13 4:07 ` Shyam Kaushik
2013-11-13 16:18 ` Bruce Fields
2013-11-01 4:38 ` Shyam Kaushik
2013-11-04 23:02 ` J. Bruce Fields [this message]
2013-11-05 13:44 ` Shyam Kaushik
2013-11-05 19:58 ` J. Bruce Fields
2013-11-06 7:27 ` Shyam Kaushik
2013-11-13 16:24 ` J. Bruce Fields
2013-11-13 22:00 ` J. Bruce Fields
2013-11-14 4:23 ` Shyam Kaushik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131104230244.GD8828@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=shyamnfs1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).