public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Banks <gnb@melbourne.sgi.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Michael Shuey <shuey@purdue.edu>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	rees@citi.umich.edu, aglo@citi.umich.edu
Subject: Re: high latency NFS
Date: Mon, 04 Aug 2008 18:04:34 +1000	[thread overview]
Message-ID: <4896B812.7000006@melbourne.sgi.com> (raw)
In-Reply-To: <20080730192110.GA17061@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

J. Bruce Fields wrote:
> You might get more responses from the linux-nfs list (cc'd).
>
> --b.
>
> On Thu, Jul 24, 2008 at 01:11:31PM -0400, Michael Shuey wrote:
>   
>>
>> iozone is reading/writing a file twice the size of memory on the client with 
>> a 32k block size.  I've tried raising this as high as 16 MB, but I still 
>> see around 6 MB/sec reads.
>>     
That won't make a skerrick of difference with wsize=32K.
>> I'm using a 2.6.9 derivative (yes, I'm a RHEL4 fan).  Testing with a stock 
>> 2.6, client and server, is the next order of business.
>>
>> NFS mount is tcp, version 3.  rsize/wsize are 32k.
Try wsize=rsize=1M.
>>   Both client and server 
>> have had tcp_rmem, tcp_wmem, wmem_max, rmem_max, wmem_default, and 
>> rmem_default tuned - tuning values are 12500000 for defaults (and minimum 
>> window sizes), 25000000 for the maximums.  Inefficient, yes, but I'm not 
>> concerned with memory efficiency at the moment.
>>     
You're aware that the server screws these up again, at least for
writes?  There was a long sequence of threads on linux-nfs about this
recently, starting with

http://marc.info/?l=linux-nfs&m=121312415114958&w=2

which is Dean Hildebrand posting a patch to make the knfsd behaviour
tunable.  ToT still looks broken.  I've been using the attached patch (I
believe a similar one was posted later in the thread by Olga
Kornievskaia)  for low-latency high-bandwidth 10ge performance work,
where it doesn't help but doesn't hurt either.  It should help for your
high-latency high-bandwidth case.  Keep your tunings though, one of 
them will be affecting the TCP window scale negotiated at connect time.
>> Both client and server kernels have been modified to provide 
>> larger-than-normal RPC slot tables.  I allow a max of 1024, but I've found 
>> that actually enabling more than 490 entries in /proc causes mount to 
>> complain it can't allocate memory and die.  That was somewhat suprising, 
>> given I had 122 GB of free memory at the time...
>>     
That number is used to size a physically contiguous kmalloc()ed array of
slots.  With a large wsize you don't need such large slot table sizes or
large numbers of nfsds to fill the pipe.

And yes, the default number of nfsds is utterly inadequate.
>> I've also applied a couple patches to allow the NFS readahead to be a 
>> tunable number of RPC slots. 
There's a patch in SLES to do that, which I'd very much like to see that
in kernel.org (Neil?).  The default NFS readahead multiplier value is
pessimal and guarantees worst-case alignment of READ rpcs during
streaming reads, so we tune it from 15 to 16.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


[-- Attachment #2: knfsd-tcp-receive-buffer-scaling --]
[-- Type: text/plain, Size: 970 bytes --]

Index: linux-2.6.16/net/sunrpc/svcsock.c
===================================================================
--- linux-2.6.16.orig/net/sunrpc/svcsock.c	2008-06-16 15:39:01.774672997 +1000
+++ linux-2.6.16/net/sunrpc/svcsock.c	2008-06-16 15:45:06.203421620 +1000
@@ -1157,13 +1159,13 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		 * particular pool, which provides an upper bound
 		 * on the number of threads which will access the socket.
 		 *
-		 * rcvbuf just needs to be able to hold a few requests.
-		 * Normally they will be removed from the queue 
-		 * as soon a a complete request arrives.
+		 * rcvbuf needs the same room as sndbuf, to allow
+		 * workloads comprising mostly WRITE calls to flow
+		 * at a reasonable fraction of line speed.
 		 */
 		svc_sock_setbufsize(svsk->sk_sock,
 				    (serv->sv_nrthreads+3) * serv->sv_bufsz,
-				    3 * serv->sv_bufsz);
+				    (serv->sv_nrthreads+3) * serv->sv_bufsz);
 
 	svc_sock_clear_data_ready(svsk);
 

  parent reply	other threads:[~2008-08-04  8:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-24 17:11 high latency NFS Michael Shuey
2008-07-30 19:21 ` J. Bruce Fields
2008-07-30 21:40   ` Shehjar Tikoo
2008-07-31  2:35     ` Michael Shuey
2008-07-31  3:15       ` J. Bruce Fields
2008-07-31  7:03         ` Neil Brown
2008-08-01  7:23           ` Dave Chinner
2008-08-01 19:15             ` J. Bruce Fields
2008-08-04  0:32               ` Dave Chinner
2008-08-04  1:11                 ` J. Bruce Fields
2008-08-04  2:14                   ` Dave Chinner
2008-08-04  9:18                   ` Bernd Schubert
2008-08-04  9:25                     ` Greg Banks
2008-08-04  1:29                 ` NeilBrown
2008-08-04  6:42                   ` Greg Banks
2008-08-04 19:07                     ` J. Bruce Fields
2008-08-05 10:51                       ` Greg Banks
2008-08-01 19:23             ` J. Bruce Fields
2008-08-04  0:38               ` Dave Chinner
2008-08-04  8:04   ` Greg Banks [this message]
2008-07-31  0:07 ` Lee Revell
2008-07-31 18:06 ` Enrico Weigelt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4896B812.7000006@melbourne.sgi.com \
    --to=gnb@melbourne.sgi.com \
    --cc=aglo@citi.umich.edu \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rees@citi.umich.edu \
    --cc=shuey@purdue.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox