linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Wright <jeff.wright@oracle.com>
To: linux-nfs@vger.kernel.org
Cc: Jeff Wright <jeff.wright@oracle.com>,
	Craig Flaskerud <Craig.flaskerud@oracle.com>,
	Donna Harland <donna.harland@oracle.com>
Subject: Help with NFS over 10GbE performance - possible NFS client to TCP bottleneck
Date: Tue, 22 May 2012 10:21:58 -0600	[thread overview]
Message-ID: <4FBBBD26.2090203@oracle.com> (raw)

Team,

I am working on a team implementing a configuration with an OEL kernel 
(2.6.32-300.3.1.el6uek.x86_64) and kernel NFS accessing an NFS server 
over 10GbE a Solaris 10.  We are trying to resolve what appears to be a 
bottleneck between the Linux kernel NFS client and the TCP stack.  
Specifically, the TCP send queue on the Linux client is empty (save a 
couple of bursts) when we are running write I/O from the file system, 
the TCP receive queue on the Solaris 10 NFS server is empty, and the RPC 
pending request queue on the Solaris 10 NFS server is zero.   If we dial 
the network to 1GbE we get a nice deep TCP send queue on the client, 
which is the bottleneck I was hoping to get to with 10GbE.  At this 
point, we am pretty sure the S10 NFS server can run to at least 1000 MBPS.

So far, we have implemented the following Linux kernel tunes:

sunrpc.tcp_slot_table_entries = 128
net.core.rmem_default = 4194304
net.core.wmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304
net.ipv4.tcp_rmem = 4096 1048576 4194304
net.ipv4.tcp_wmem = 4096 1048576 4194304
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_syncookies = 1
net.core.netdev_max_backlog = 300000

In addition, we am running jumbo frames on the 10GbE NIC and we have 
cpuspeed and irqbalance disabled (no noticeable changes when we did 
this).  The mount options on the client side are as follows:

192.168.44.51:/export/share on /export/share type nfs 
(rw,nointr,bg,hard,rsize=1048576,wsize=1048576,proto=tcp,vers=3,addr=192.168.44.51)

In this configuration we get about 330 MBPS of write throughput with 16 
pending stable (open with O_DIRECT) synchronous (no kernel aio in the 
I/O application) writes.  If we scale beyond 16 pending I/O response 
time increases but throughput remains fixed.  It feels like there is a 
problem with getting more than 16 pending I/O out to TCP, but we can't 
tell for sure based on our observations so far.  We did notice that 
tuning the wsize down to 32kB increased throughput to 400 MBPS, but we 
could not identify the root cause of this change.

Please let us know if you have any suggestions for either diagnosing the 
bottleneck more accurately or relieving the bottleneck.  Thank you in 
advance.

Sincerely,

Jeff

             reply	other threads:[~2012-05-22 16:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-22 16:21 Jeff Wright [this message]
2012-06-13 15:08 ` Help with NFS over 10GbE performance - possible NFS client to TCP bottleneck Andy Adamson
2012-06-13 15:17   ` Jeff Wright
2012-06-14 14:53     ` Andy Adamson
2012-06-14 16:55       ` Jeff Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBBBD26.2090203@oracle.com \
    --to=jeff.wright@oracle.com \
    --cc=Craig.flaskerud@oracle.com \
    --cc=donna.harland@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).