public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Shuey <shuey@purdue.edu>
To: Shehjar Tikoo <shehjart@cse.unsw.edu.au>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	rees@citi.umich.edu, aglo@citi.umich.edu
Subject: Re: high latency NFS
Date: Wed, 30 Jul 2008 22:35:49 -0400	[thread overview]
Message-ID: <200807302235.50068.shuey@purdue.edu> (raw)
In-Reply-To: <4890DFC7.3020309@cse.unsw.edu.au>

Thanks for all the tips I've received this evening.  However, I figured out 
the problem late last night. :-)

I was only using the default 8 nfsd threads on the server.  When I raised 
this to 256, the read bandwidth went from about 6 MB/sec to about 95 
MB/sec, at 100ms of netem-induced latency.  Not too shabby.  I can get 
about 993 Mbps on the gigE link between client and server, or 124 MB/sec 
max, so this is about 76% of wire speed.  Network connections pass through 
three switches, at least one of which acts as a router, so I'm feeling 
pretty good about things so far.

FYI, the server is using an ext3 file system, on top of a 10 GB /dev/ram0 
ramdisk (exported async, mounted async).  Oddly enough, /dev/ram0 seems a 
bit slower than tmpfs and a loopback-mounted file - go figure.

To avoid confusing this with cache effects, I'm using iozone on an 8GB file 
from a client with only 4GB of memory.  Like I said, I'm mainly interested 
in large file performance. :-)

-- 
Mike Shuey
Purdue University/ITaP


On Wednesday 30 July 2008, Shehjar Tikoo wrote:
> J. Bruce Fields wrote:
> > You might get more responses from the linux-nfs list (cc'd).
> >
> > --b.
> >
> > On Thu, Jul 24, 2008 at 01:11:31PM -0400, Michael Shuey wrote:
> >> I'm currently toying with Linux's NFS, to see just how fast it
> >> can go in a high-latency environment.  Right now, I'm simulating
> >>  a 100ms delay between client and server with netem (just 100ms
> >> on the outbound packets from the client, rather than 50ms each
> >> way). Oddly enough, I'm running into performance problems. :-)
> >>
> >> According to iozone, my server can sustain about 90/85 MB/s
> >> (reads/writes) without any latency added.  After a pile of
> >> tweaks, and injecting 100ms of netem latency, I'm getting 6/40
> >> MB/s (reads/writes).  I'd really like to know why writes are now
> >>  so much faster than reads, and what sort of things might boost
> >> the read throughput.  Any suggestions?
>
> Is the server sync or async mounted? I've seen such performance
> inversion between read and write when the mount mode is async.
>
> What is the number of nfsd threads at the server?
>
> Which file system are you using at the server?
>
> >> 1 The read throughput seems to be proportional to the latency -
> >> adding only 10ms of delay gives 61 MB/s reads, in limited testing
> >>  (need to look at it further).  While that's to be expected, to
> >> some extent, I'm hoping there's some form of readahead that can
> >> help me out here (assume big sequential reads).
> >>
> >> iozone is reading/writing a file twice the size of memory on the
> >>  client with a 32k block size.  I've tried raising this as high
> >> as 16 MB, but I still see around 6 MB/sec reads.
>
> In iozone, are you running the read and write test during the same run
> of iozone? Iozone runs read tests, after writes so that the file for
> the read test exists on the server. You should try running write and
> read tests in separate runs to prevent client side caching issues from
> influencing raw server read(and read-ahead) performance. You can use
> the -w option in iozone to prevent iozone from calling unlink on the
> file after the write test has finished, so you can use the same file
> in a separate read test run.
>
> >> I'm using a 2.6.9 derivative (yes, I'm a RHEL4 fan).  Testing
> >> with a stock 2.6, client and server, is the next order of
> >> business.
>
> You can try building the kernel with oprofile support and use it to
> measure where the client CPU is spending its time. It is possible that
> client-side locking or other algorithm issues are resulting in such
> low read throughput. Note, when you start oprofile profiling, use a
> CPU_CYCLES count of 5000. I've observed more accurate results with
> this sample size for NFS performance.
>
> >> NFS mount is tcp, version 3.  rsize/wsize are 32k.  Both client
> >> and server have had tcp_rmem, tcp_wmem, wmem_max, rmem_max,
> >> wmem_default, and rmem_default tuned - tuning values are 12500000
> >>  for defaults (and minimum window sizes), 25000000 for the
> >> maximums.  Inefficient, yes, but I'm not concerned with memory
> >> efficiency at the moment.
> >>
> >> Both client and server kernels have been modified to provide
> >> larger-than-normal RPC slot tables.  I allow a max of 1024, but
> >> I've found that actually enabling more than 490 entries in /proc
> >>  causes mount to complain it can't allocate memory and die.  That
> >>  was somewhat suprising, given I had 122 GB of free memory at the
> >>  time...
> >>
> >> I've also applied a couple patches to allow the NFS readahead to
> >>  be a tunable number of RPC slots.  Currently, I set this to 489
> >>  on client and server (so it's one less than the max number of
> >> RPC slots).  Bandwidth delay product math says 380ish slots
> >> should be enough to keep a gigabit line full, so I suspect
> >> something else is preventing me from seeing the readahead I
> >> expect.
> >>
> >> FYI, client and server are connected via gigabit ethernet.
> >> There's a couple routers in the way, but they talk at 10gigE and
> >>  can route wire speed. Traffic is IPv4, path MTU size is 9000
> >> bytes.
>
> The following are not completely relevant here but just to get some
> more info:
> What is the raw TCP throughput that you get between the server and
> client machine on this network?
>
> You could run the tests with bare minimum number of network
> elements between the server and the client to see whats the best
> network performance for NFS you can extract from this server and
> client machine.
>
> >> Is there anything I'm missing?
> >>
> >> -- Mike Shuey Purdue University/ITaP

  reply	other threads:[~2008-07-31  2:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200807241311.31457.shuey@purdue.edu>
     [not found] ` <200807241311.31457.shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org>
2008-07-30 19:21   ` high latency NFS J. Bruce Fields
2008-07-30 21:40     ` Shehjar Tikoo
2008-07-31  2:35       ` Michael Shuey [this message]
     [not found]         ` <200807302235.50068.shuey-olO2ZdjDehc3uPMLIKxrzw@public.gmane.org>
2008-07-31  3:15           ` J. Bruce Fields
2008-07-31  7:03             ` Neil Brown
     [not found]               ` <18577.25513.494821.481623-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-08-01  7:23                 ` Dave Chinner
2008-08-01 19:15                   ` J. Bruce Fields
2008-08-04  0:32                     ` Dave Chinner
2008-08-04  1:11                       ` J. Bruce Fields
2008-08-04  2:14                         ` Dave Chinner
2008-08-04  9:18                         ` Bernd Schubert
     [not found]                           ` <200808041118.19743.bs-PKu+Ek1N2UGzQB+pC5nmwQ@public.gmane.org>
2008-08-04  9:25                             ` Greg Banks
2008-08-04  1:29                       ` NeilBrown
     [not found]                         ` <52873.192.168.1.70.1217813385.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2008-08-04  6:42                           ` Greg Banks
     [not found]                             ` <4896A4EE.9030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-08-04 19:07                               ` J. Bruce Fields
2008-08-05 10:51                                 ` Greg Banks
2008-08-01 19:23                   ` J. Bruce Fields
2008-08-04  0:38                     ` Dave Chinner
2008-08-04  8:04     ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200807302235.50068.shuey@purdue.edu \
    --to=shuey@purdue.edu \
    --cc=aglo@citi.umich.edu \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rees@citi.umich.edu \
    --cc=shehjart@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox