From: "Chuck Lever" <chucklever@gmail.com>
To: "Andrew Bell" <andrew.bell.ia@gmail.com>
Cc: "Peter Staubach" <staubach@redhat.com>, linux-nfs@vger.kernel.org
Subject: Re: Performance Diagnosis
Date: Tue, 15 Jul 2008 13:20:49 -0400 [thread overview]
Message-ID: <76bd70e30807151020j6cefbe71p8ce156b1c8fb2d86@mail.gmail.com> (raw)
In-Reply-To: <e80abd30807150934tc14e793ydd7aae44b4c3111b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Jul 15, 2008 at 12:34 PM, Andrew Bell <andrew.bell.ia@gmail.com> wrote:
> On Tue, Jul 15, 2008 at 11:23 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> On Tue, Jul 15, 2008 at 11:58 AM, Peter Staubach <staubach@redhat.com> wrote:
>>> If it is the notion described above, sometimes called head
>>> of line blocking, then we could think about ways to duplex
>>> operations over multiple TCP connections, perhaps with one
>>> connection for small, low latency operations, and another
>>> connection for larger, higher latency operations.
>>
>> I've dreamed about that for years. I don't think it would be too
>> difficult, but one thing that has held it back is the shortage of
>> ephemeral ports on the client may reduce the number of concurrent
>> mount points we can support.
>
> Could one come up with a way to insert "small" ops somewhere in middle
> of the existing queue, or are the TCP send buffers typically too deep
> for this to do much good? Seems like more than one connection would
> allow "good" servers to handle requests simultaneously anyway.
There are several queues inside the NFS client stack.
The underlying RPC client manages a slot table. Each slot contains
one pending RPC request; ie an RPC has been sent and this slot held is
waiting for the reply. The table contains 16 slots by default. You
can adjust the size (up to 128 slots) via a sysctl, and that may help
your situation by allowing more reads or writes to go to the server at
once.
The RPC client allows a single RPC to be sent on the socket at a time.
(Waiting for the reply is asynchronous, so the next request can be
sent on the socket as soon as this one is done being sent).
Especially for large requests, this may mean waiting for the socket
buffer to be emptied before more data can be sent. The socket is held
for each each request until it is entirely sent so that data for
different requests are not intermingled. If the network is not
congested, this is generally not a problem, but if the server is
backed up, it can take a while before the buffer is ready for more
data from a single large request.
Before an RPC gets into a slot, though, it waits on a backlog queue.
This queue can grow quite long in situations where there are a lot of
reads or writes and the server or network is slow.
The Python scripts I mentioned before have information about the
backlog queue size, slot table utilization, and per-operation average
latency. So you can clearly determine what the client is waiting for.
> Is there really that big a shortage of ephemeral ports?
Yes. The NFS client uses only privileged ports (although you can
optionally tell it to use non-privileged ports as well). For
long-lived sockets (such as transport sockets for NFS), it is careful
to choose privileged ports that are not a "well known" service (eg
port 22 is the standard ssh service port). So the default port range
is roughly between 670 and 1023.
>> One way to avoid the port issue is to construct an SCTP transport for
>> NFS. SCTP allows multiple streams on the same connection, effectively
>> eliminating head of line blocking.
>
> Waiting for SCTP sounds like a long-term solution, as server vendors
> probably have little incentive.
Yep.
> Thanks for the ideas. I'll have to see what kind of time I can get to
> investigate this stuff.
We neglected to mention that you can also increase the number of NFSD
threads on your server. I think eight is the default, and often that
isn't enough.
--
Chuck Lever
next prev parent reply other threads:[~2008-07-15 17:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-15 15:34 Performance Diagnosis Andrew Bell
[not found] ` <e80abd30807150834m47a1b86cle39885150f1d5bfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-15 15:49 ` Chuck Lever
2008-07-15 15:58 ` Peter Staubach
2008-07-15 16:23 ` Chuck Lever
[not found] ` <76bd70e30807150923r31027edxb0394a220bbe879b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-15 16:34 ` Andrew Bell
[not found] ` <e80abd30807150934tc14e793ydd7aae44b4c3111b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-15 17:20 ` Chuck Lever [this message]
2008-07-15 17:44 ` Peter Staubach
2008-07-15 18:17 ` Chuck Lever
[not found] ` <76bd70e30807151117g520f22cj1dfe26b971987d38-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-15 18:51 ` Trond Myklebust
2008-07-15 19:21 ` Peter Staubach
2008-07-15 19:35 ` Trond Myklebust
2008-07-15 19:55 ` Peter Staubach
2008-07-15 20:27 ` Trond Myklebust
2008-07-15 20:48 ` Peter Staubach
2008-07-15 21:15 ` Talpey, Thomas
2008-07-16 7:35 ` Benny Halevy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76bd70e30807151020j6cefbe71p8ce156b1c8fb2d86@mail.gmail.com \
--to=chucklever@gmail.com \
--cc=andrew.bell.ia@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=staubach@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox