From: Andreas Schuldei <andreas@schuldei.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: nfs performance problem
Date: Thu, 25 Oct 2007 21:34:57 +0200 [thread overview]
Message-ID: <20071025193457.GE4499@jakobus.spotify.net> (raw)
In-Reply-To: <7B68ECC3-7EBA-442F-9FFD-A0E3F2DCC61A@oracle.com>
* Chuck Lever (chuck.lever@oracle.com) [071025 20:25]:
> On Oct 25, 2007, at 9:10 AM, Andreas Schuldei wrote:
> >Hi!
> >
> >I need to tune a nfs server and client. on the server we have
> >several Tbyte of ~2Mbyte files and we need to transfer them read
> >only to the client. latency and throughput are crucial.
Because i have Tbytes of data but only a few Gbytes or RAM my
cache hits are rather unlikely. =
> >Right now i have only four disks in the server and i get 50Mbyte
> >out of each of them, simultaniously, for real world loads (random
> >reads across the disk, trying to minimizing the seeks by reading
> >the files in one go with
> >
> >for i in a b h i ; do ( find /var/disks/sd$i -type f | xargs -I=B0 dd if=
=3D=B0 bs=3D2M of=3D/dev/null status=3Dnoxfer =
> >2>/dev/null & ) ; done
> >
> >so with this (4*50 Mbyte/s) i should be able to saturate both
> >network cards.
note that this is my server's disk io performance.
> With a single client, you should not expect to get any better performance=
than by running the web service on the NFS =
> server. The advantage of using NFS under a web service is that you can t=
ransparently scale horizontally. When you add =
> a second or third web server that serves the same file set, you will see =
an effective increase in the size of the data =
> cache between your NFS server's disks and the web servers.
Not with terabyte of data and a distributed access pattern.
Certainly i will have some cache hits but not enough to be able
to serv considerable amounts out of RAM.
> But don't expect to get better data throughput over NFS than you see on y=
our local NFS server. =
That is exactly the point. on my server i get 4*50Mbytes =3D
200Mbyte/s out of the disks (with the above FOR loop around the
find and dd) and when i export on the same server the disks to an
nfs client i all of a sudden loose ~75% of the performance.
> If anything, the 10s =
> latency you see when the web server is on the same system with the disks =
is indicative of local file system =
> configuration issues.
how can i measure the latency on the local machine? i would be
very interested in seeing how it behaves latency wise.
> >on the server i start 128 nfs servers (RPCNFSDCOUNT=3D128) and export
> >the disks like this:
> >
> >/usr/sbin/exportfs -v
> >/var/disks/sda <world>(ro,async,wdelay,root_squash,no_subtree_check,anon=
uid=3D65534,anongid=3D65534)
> >/var/disks/sdb <world>(ro,async,wdelay,root_squash,no_subtree_check,anon=
uid=3D65534,anongid=3D65534)
> >/var/disks/sdh <world>(ro,async,wdelay,root_squash,no_subtree_check,anon=
uid=3D65534,anongid=3D65534)
> >/var/disks/sdi <world>(ro,async,wdelay,root_squash,no_subtree_check,anon=
uid=3D65534,anongid=3D65534)
> =
> On the server, mounting the web data file systems with "noatime" may help=
reduce the number of seeks on the disks.
yes, we do that already.
> >on the client i mount them like this:
> >
> >lotta:/var/disks/sda on /var/disks/sda type nfs (ro,hard,intr,proto=3Dtc=
p,rsize=3D32k,addr=3D217.213.5.44)
> >lotta:/var/disks/sdb on /var/disks/sdb type nfs (ro,hard,intr,proto=3Dtc=
p,rsize=3D32k,addr=3D217.213.5.44)
> >lotta:/var/disks/sdh on /var/disks/sdh type nfs (ro,hard,intr,proto=3Dtc=
p,rsize=3D32k,addr=3D217.213.5.44)
> >lotta:/var/disks/sdi on /var/disks/sdi type nfs (ro,hard,intr,proto=3Dtc=
p,rsize=3D32k,addr=3D217.213.5.44)
> =
> There are some client-side mount options that might also help. Using "no=
cto" and "actimeo=3D7200" could reduce =
> synchonous NFS protocol overhead. I also notice a significant amount of =
readdirplus traffic. Readdirplus requests are =
> fairly heavyweight, and in this scenario may be unneeded overhead. Your =
client might support the recently added =
> "nordirplus" mount option, which could be helpful.
> =
> I wonder if "rsize=3D32k" is supported - you might want "rsize=3D32768" i=
nstead.
i think that gave an effect. now i am in the 90-100Mbyte/s
ballpark and might hit the one-nic (1gbit) bottleneck.
> Or better, let the client and server =
> negotiate the maximum that each supports automatically by leaving this op=
tion off. You can check what options are in =
> effect on each NFS mount point by looking in /proc/self/mountstats on the=
client.
there it says now, after i specified rsize=3D2097152:
opts: rw,vers=3D3,rsize=3D1048576,wsize=3D1048576,acregmin=3D3,ac=
regmax=3D60,acdirmin=3D30,acdirmax=3D60,hard,intr,nolock,proto=3Dtcp,timeo=
=3D600,retrans=3D2,sec=3Dsys
i am surprised that it did not protest when it could not parse
the "k". note that it it only took 1M chunks. how come?
> Enabling jumbo frames between your NFS server and client will help. Depe=
nding on your NIC, though, it may introduce =
> some instability (driver and hardware mileage may vary).
i will test that and bonding two nicks.
> Insufficient read-ahead on your server may be an issue here. Read traffi=
c from the client often arrives at the server =
> out of order, preventing the server from cleanly detecting sequential rea=
ds. I believe there was a recent change to =
> the NFS server that addresses this issue.
when did that go in? do i need to activate that somehow?
how can i measure the latency on a loaded server? both locally
and over nfs?
/andreas
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-10-25 19:35 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-25 13:10 nfs performance problem Andreas Schuldei
2007-10-25 13:53 ` Bernd Schubert
2007-10-27 9:25 ` Andreas Schuldei
2007-10-25 15:25 ` Chuck Lever
2007-10-25 19:34 ` Andreas Schuldei [this message]
2007-10-26 14:18 ` Chuck Lever
2007-10-26 17:01 ` Talpey, Thomas
2007-10-27 1:35 ` dean hildebrand
[not found] ` <c5befdd30710261835q50d34026h4dad32090db8a084@mail.gmail.co m>
2007-10-29 12:59 ` Talpey, Thomas
-- strict thread matches above, loose matches on Subject: below --
2007-10-25 14:39 Andreas Schuldei
2002-11-07 15:19 Baker, Byran
2002-11-07 15:49 ` Matt Heaton
2002-11-07 17:32 ` Ragnar Kjørstad
2002-11-06 17:08 pwitting
2002-11-05 22:09 Lever, Charles
2002-11-05 20:56 Lever, Charles
2002-11-05 18:03 poczta.dotcom.pl
2002-11-05 19:17 ` Ragnar Kjørstad
2002-11-05 19:55 ` poczta.dotcom.pl
2002-11-05 20:22 ` Matt Heaton
2002-11-05 20:39 ` Benjamin LaHaise
2002-11-05 20:46 ` Matt Heaton
2002-11-05 21:24 ` Benjamin LaHaise
2002-11-05 23:32 ` Ragnar Kjørstad
2002-11-06 8:59 ` myciel
2002-11-06 10:16 ` Ragnar Kjørstad
2002-11-06 11:46 ` myciel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071025193457.GE4499@jakobus.spotify.net \
--to=andreas@schuldei.org \
--cc=chuck.lever@oracle.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.