From: Benny Halevy <bhalevy@panasas.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>,
quanli gui <gqlxj1987@gmail.com>,
Benny Halevy <bhalevy@tonian.com>,
linux-nfs@vger.kernel.org, "Mueller,
Brian" <bmueller@panasas.com>
Subject: Re: [nfsv4]nfs client bug
Date: Thu, 30 Jun 2011 18:42:02 +0300 [thread overview]
Message-ID: <4E0C994A.2060302@panasas.com> (raw)
In-Reply-To: <1309448157.9544.88.camel@lade.trondhjem.org>
On 2011-06-30 18:35, Trond Myklebust wrote:
> On Thu, 2011-06-30 at 18:13 +0300, Benny Halevy wrote:
>> On 2011-06-30 17:24, Trond Myklebust wrote:
>>> On Thu, 2011-06-30 at 09:36 -0400, Andy Adamson wrote:
>>>> On Jun 29, 2011, at 10:32 PM, quanli gui wrote:
>>>>
>>>>> When I use the iperf tools for one client to 4 ds, the network
>>>>> throughput is 890MB/S. It reflect that it is indeed 10GE non-blocking.
>>>>>
>>>>> a. about block size, I use bs=1M when I use dd
>>>>> b. we indeed use the tcp (doesn't the nfsv4 use the tcp defaultly?)
>>>>> c. the jumbo frames is what? how set mtu automatically?
>>>>>
>>>>> Brian, do you have some more tips?
>>>>
>>>> 1) Set the mtu on both the client and the server 10G interface. Sometimes 9000 is too high. My setup uses 8000.
>>>> To set MTU on interface eth0.
>>>>
>>>> % ifconfig eth0 mtu 9000
>>>>
>>>> iperf will report the MTU of the full path between client and server - use it to verify the MTU of the connection.
>>>>
>>>> 2) Increase the # of rpc_slots on the client.
>>>> % echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
>>>>
>>>> 3) Increase the # of server threads
>>>>
>>>> % echo 128 > /proc/fs/nfsd/threads
>>>> % service nfs restart
>>>>
>>>> 4) Ensure the TCP buffers on both the client and the server are large enough for the TCP window.
>>>> Calculate the required buffer size by pinging the server from the client with the MTU packet size and multiply the round trip time by the interface capacity
>>>>
>>>> % ping -s 9000 server - say 108 ms average
>>>>
>>>> 10Gbits/sec = 1,250,000,000 Bytes/sec * .108 sec = 135,000,000 bytes
>>>>
>>>> Use this number to set the following:
>>>> sysctl -w net.core.rmem_max = 135000000
>>>> sysctl -w net.core.wmem_max 135000000
>>>> sysctl -w "net.ipv4.tcp_rmem <first number unchaged> <second unchanged> 135000000"
>>>> sysctl net.ipv4.tcp_wmem <first number unchaged> <second unchanged> 135000000"
>>>>
>>>> 5) mount with rsize=131072,wsize=131072
>>>
>>> 6) Note that NFS always guarantees that the file is _on_disk_ after
>>> close(), so if you are using 'dd' to test, then you should be using the
>>> 'conv=fsync' flag (i.e 'dd if=/dev/zero of=test count=20k conv=fsync')
>>> in order to obtain a fair comparison between the NFS and local disk
>>> performance. Otherwise, you are comparing NFS and local _pagecache_
>>> performance.
>>
>> FWIW, modern versions of gnu dd (not sure exactly which version changed that)
>> calculate and report throughput after close()ing the output file.
>
> ...but not after syncing it unless you explicitly request that.
>
> On most (all?) local filesystems, close() does not imply fsync().
Right. My point is that for benchmarking NFS, conv=fsync won't show
any noticeable difference. We're in complete agreement that it's required
for benchmarking local file system performance.
Benny
>
> Trond
next prev parent reply other threads:[~2011-06-30 15:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BANLkTi=xcQseTx8BTWEzg-1DO=ayJuMLrw@mail.gmail.com>
2011-06-29 16:28 ` [nfsv4]nfs client bug Benny Halevy
2011-06-30 2:32 ` quanli gui
2011-06-30 13:36 ` Andy Adamson
2011-06-30 14:24 ` Trond Myklebust
2011-06-30 15:13 ` Benny Halevy
2011-06-30 15:35 ` Trond Myklebust
2011-06-30 15:42 ` Benny Halevy [this message]
2011-06-30 15:52 ` quanli gui
2011-06-30 15:57 ` Trond Myklebust
2011-06-30 16:26 ` Andy Adamson
2011-06-30 16:57 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E0C994A.2060302@panasas.com \
--to=bhalevy@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=andros@netapp.com \
--cc=bhalevy@tonian.com \
--cc=bmueller@panasas.com \
--cc=gqlxj1987@gmail.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.