From: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Michael Shuey <shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org>,
linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
rees@citi.umich.edu, aglo@citi.umich.edu
Subject: Re: high latency NFS
Date: Mon, 04 Aug 2008 18:04:34 +1000 [thread overview]
Message-ID: <4896B812.7000006@melbourne.sgi.com> (raw)
In-Reply-To: <20080730192110.GA17061@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]
J. Bruce Fields wrote:
> You might get more responses from the linux-nfs list (cc'd).
>
> --b.
>
> On Thu, Jul 24, 2008 at 01:11:31PM -0400, Michael Shuey wrote:
>
>>
>> iozone is reading/writing a file twice the size of memory on the client with
>> a 32k block size. I've tried raising this as high as 16 MB, but I still
>> see around 6 MB/sec reads.
>>
That won't make a skerrick of difference with wsize=32K.
>> I'm using a 2.6.9 derivative (yes, I'm a RHEL4 fan). Testing with a stock
>> 2.6, client and server, is the next order of business.
>>
>> NFS mount is tcp, version 3. rsize/wsize are 32k.
Try wsize=rsize=1M.
>> Both client and server
>> have had tcp_rmem, tcp_wmem, wmem_max, rmem_max, wmem_default, and
>> rmem_default tuned - tuning values are 12500000 for defaults (and minimum
>> window sizes), 25000000 for the maximums. Inefficient, yes, but I'm not
>> concerned with memory efficiency at the moment.
>>
You're aware that the server screws these up again, at least for
writes? There was a long sequence of threads on linux-nfs about this
recently, starting with
http://marc.info/?l=linux-nfs&m=121312415114958&w=2
which is Dean Hildebrand posting a patch to make the knfsd behaviour
tunable. ToT still looks broken. I've been using the attached patch (I
believe a similar one was posted later in the thread by Olga
Kornievskaia) for low-latency high-bandwidth 10ge performance work,
where it doesn't help but doesn't hurt either. It should help for your
high-latency high-bandwidth case. Keep your tunings though, one of
them will be affecting the TCP window scale negotiated at connect time.
>> Both client and server kernels have been modified to provide
>> larger-than-normal RPC slot tables. I allow a max of 1024, but I've found
>> that actually enabling more than 490 entries in /proc causes mount to
>> complain it can't allocate memory and die. That was somewhat suprising,
>> given I had 122 GB of free memory at the time...
>>
That number is used to size a physically contiguous kmalloc()ed array of
slots. With a large wsize you don't need such large slot table sizes or
large numbers of nfsds to fill the pipe.
And yes, the default number of nfsds is utterly inadequate.
>> I've also applied a couple patches to allow the NFS readahead to be a
>> tunable number of RPC slots.
There's a patch in SLES to do that, which I'd very much like to see that
in kernel.org (Neil?). The default NFS readahead multiplier value is
pessimal and guarantees worst-case alignment of READ rpcs during
streaming reads, so we tune it from 15 to 16.
--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
[-- Attachment #2: knfsd-tcp-receive-buffer-scaling --]
[-- Type: text/plain, Size: 970 bytes --]
Index: linux-2.6.16/net/sunrpc/svcsock.c
===================================================================
--- linux-2.6.16.orig/net/sunrpc/svcsock.c 2008-06-16 15:39:01.774672997 +1000
+++ linux-2.6.16/net/sunrpc/svcsock.c 2008-06-16 15:45:06.203421620 +1000
@@ -1157,13 +1159,13 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
* particular pool, which provides an upper bound
* on the number of threads which will access the socket.
*
- * rcvbuf just needs to be able to hold a few requests.
- * Normally they will be removed from the queue
- * as soon a a complete request arrives.
+ * rcvbuf needs the same room as sndbuf, to allow
+ * workloads comprising mostly WRITE calls to flow
+ * at a reasonable fraction of line speed.
*/
svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_bufsz,
- 3 * serv->sv_bufsz);
+ (serv->sv_nrthreads+3) * serv->sv_bufsz);
svc_sock_clear_data_ready(svsk);
WARNING: multiple messages have this Message-ID (diff)
From: Greg Banks <gnb@melbourne.sgi.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Michael Shuey <shuey@purdue.edu>,
linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
rees@citi.umich.edu, aglo@citi.umich.edu
Subject: Re: high latency NFS
Date: Mon, 04 Aug 2008 18:04:34 +1000 [thread overview]
Message-ID: <4896B812.7000006@melbourne.sgi.com> (raw)
In-Reply-To: <20080730192110.GA17061@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]
J. Bruce Fields wrote:
> You might get more responses from the linux-nfs list (cc'd).
>
> --b.
>
> On Thu, Jul 24, 2008 at 01:11:31PM -0400, Michael Shuey wrote:
>
>>
>> iozone is reading/writing a file twice the size of memory on the client with
>> a 32k block size. I've tried raising this as high as 16 MB, but I still
>> see around 6 MB/sec reads.
>>
That won't make a skerrick of difference with wsize=32K.
>> I'm using a 2.6.9 derivative (yes, I'm a RHEL4 fan). Testing with a stock
>> 2.6, client and server, is the next order of business.
>>
>> NFS mount is tcp, version 3. rsize/wsize are 32k.
Try wsize=rsize=1M.
>> Both client and server
>> have had tcp_rmem, tcp_wmem, wmem_max, rmem_max, wmem_default, and
>> rmem_default tuned - tuning values are 12500000 for defaults (and minimum
>> window sizes), 25000000 for the maximums. Inefficient, yes, but I'm not
>> concerned with memory efficiency at the moment.
>>
You're aware that the server screws these up again, at least for
writes? There was a long sequence of threads on linux-nfs about this
recently, starting with
http://marc.info/?l=linux-nfs&m=121312415114958&w=2
which is Dean Hildebrand posting a patch to make the knfsd behaviour
tunable. ToT still looks broken. I've been using the attached patch (I
believe a similar one was posted later in the thread by Olga
Kornievskaia) for low-latency high-bandwidth 10ge performance work,
where it doesn't help but doesn't hurt either. It should help for your
high-latency high-bandwidth case. Keep your tunings though, one of
them will be affecting the TCP window scale negotiated at connect time.
>> Both client and server kernels have been modified to provide
>> larger-than-normal RPC slot tables. I allow a max of 1024, but I've found
>> that actually enabling more than 490 entries in /proc causes mount to
>> complain it can't allocate memory and die. That was somewhat suprising,
>> given I had 122 GB of free memory at the time...
>>
That number is used to size a physically contiguous kmalloc()ed array of
slots. With a large wsize you don't need such large slot table sizes or
large numbers of nfsds to fill the pipe.
And yes, the default number of nfsds is utterly inadequate.
>> I've also applied a couple patches to allow the NFS readahead to be a
>> tunable number of RPC slots.
There's a patch in SLES to do that, which I'd very much like to see that
in kernel.org (Neil?). The default NFS readahead multiplier value is
pessimal and guarantees worst-case alignment of READ rpcs during
streaming reads, so we tune it from 15 to 16.
--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
[-- Attachment #2: knfsd-tcp-receive-buffer-scaling --]
[-- Type: text/plain, Size: 970 bytes --]
Index: linux-2.6.16/net/sunrpc/svcsock.c
===================================================================
--- linux-2.6.16.orig/net/sunrpc/svcsock.c 2008-06-16 15:39:01.774672997 +1000
+++ linux-2.6.16/net/sunrpc/svcsock.c 2008-06-16 15:45:06.203421620 +1000
@@ -1157,13 +1159,13 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
* particular pool, which provides an upper bound
* on the number of threads which will access the socket.
*
- * rcvbuf just needs to be able to hold a few requests.
- * Normally they will be removed from the queue
- * as soon a a complete request arrives.
+ * rcvbuf needs the same room as sndbuf, to allow
+ * workloads comprising mostly WRITE calls to flow
+ * at a reasonable fraction of line speed.
*/
svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_bufsz,
- 3 * serv->sv_bufsz);
+ (serv->sv_nrthreads+3) * serv->sv_bufsz);
svc_sock_clear_data_ready(svsk);
next prev parent reply other threads:[~2008-08-04 8:11 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-24 17:11 high latency NFS Michael Shuey
[not found] ` <200807241311.31457.shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org>
2008-07-30 19:21 ` J. Bruce Fields
2008-07-30 19:21 ` J. Bruce Fields
2008-07-30 21:40 ` Shehjar Tikoo
2008-07-30 21:40 ` Shehjar Tikoo
2008-07-31 2:35 ` Michael Shuey
[not found] ` <200807302235.50068.shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org>
2008-07-31 3:15 ` J. Bruce Fields
2008-07-31 3:15 ` J. Bruce Fields
2008-07-31 7:03 ` Neil Brown
2008-07-31 7:03 ` Neil Brown
[not found] ` <18577.25513.494821.481623-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-08-01 7:23 ` Dave Chinner
2008-08-01 7:23 ` Dave Chinner
2008-08-01 19:15 ` J. Bruce Fields
2008-08-01 19:15 ` J. Bruce Fields
2008-08-04 0:32 ` Dave Chinner
2008-08-04 0:32 ` Dave Chinner
2008-08-04 1:11 ` J. Bruce Fields
2008-08-04 1:11 ` J. Bruce Fields
2008-08-04 2:14 ` Dave Chinner
2008-08-04 2:14 ` Dave Chinner
2008-08-04 9:18 ` Bernd Schubert
2008-08-04 9:18 ` Bernd Schubert
[not found] ` <200808041118.19743.bs-PKu+Ek1N2UGzQB+pC5nmwQ@public.gmane.org>
2008-08-04 9:25 ` Greg Banks
2008-08-04 9:25 ` Greg Banks
2008-08-04 1:29 ` NeilBrown
2008-08-04 1:29 ` NeilBrown
[not found] ` <52873.192.168.1.70.1217813385.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2008-08-04 6:42 ` Greg Banks
2008-08-04 6:42 ` Greg Banks
[not found] ` <4896A4EE.9030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-08-04 19:07 ` J. Bruce Fields
2008-08-04 19:07 ` J. Bruce Fields
2008-08-05 10:51 ` Greg Banks
2008-08-05 10:51 ` Greg Banks
2008-08-01 19:23 ` J. Bruce Fields
2008-08-01 19:23 ` J. Bruce Fields
2008-08-04 0:38 ` Dave Chinner
2008-08-04 0:38 ` Dave Chinner
2008-08-04 8:04 ` Greg Banks [this message]
2008-08-04 8:04 ` Greg Banks
2008-07-31 0:07 ` Lee Revell
2008-07-31 18:06 ` Enrico Weigelt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4896B812.7000006@melbourne.sgi.com \
--to=gnb-cp1dwlodopni96+mszhfpqc/g2k4zdhf@public.gmane.org \
--cc=aglo@citi.umich.edu \
--cc=bfields@fieldses.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@citi.umich.edu \
--cc=shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.