On 7/14/05, Hugh Caley <hcaley@plasmabat.com> wrote:
A valid point, of course, but I don't think I'm actually expecting a
single NFSd to act like an expensive Netapp. I do think that wondering
why the Netapp is twice as fast for a sequential write is a valid
question, even if the OS and NFS server subsystem are free. I was kind
of hoping someone would just say "you're getting what you should expect
to get" or "wow, that's slow, try this and this and this".
You referenced that you were getting 300 megabits (or 37MB/s). I have several SLES 9 nfs servers (using self compiled
2.6.11.5 kernel) running on IBM x345 hardware (dual cpu pentum 4, 2gb ram, dual qlogic hbas) connected to a single LSI storage array that presents four luns (two from controller A and two from B). Each lun is 1TB and made from hardware raid 8+1. Luns are merged together using device mapper.
It's not uncommon with my setup to get a sustained write speed of 75MB/s on one of our SLES 9 compute systems (AMD Opterons) when doing a sequential write of an 8GB file. With two systems writing at the same time I get aggregate bandwidth better than 75MB/s (can't recall what it is).
I use tcp/nfs3 and for write testing I use 'iozone -c -e -s 8192m -i 0'. I use 128 nfsds, export with 'rw,sync,no_subtree_check,no_root_squash' and add the following to sysctl.conf:
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 65536 8388608
net.ipv4.tcp_mem
= 8388608 8388608 8388608
On nfs clients (Sun, Linux, IRIX) I use the mount options: nosuid,rw,bg,hard,intr,vers=3,proto=tcp,rsize=32768,wsize=32768. On AIX I use the same options, but also add the critical 'combehind' (without it writes of large files [ie. close to the size of physical mem] is just horrid).
Chris