Re: NFS tuning - high performance throughput.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg Banks <gnb@sgi.com>
To: "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com>
Cc: "M. Todd Smith" <todd@sohovfx.com>, nfs@lists.sourceforge.net
Subject: Re: NFS tuning - high performance throughput.
Date: Thu, 16 Jun 2005 08:47:52 +1000	[thread overview]
Message-ID: <20050615224752.GA18915@sgi.com> (raw)
In-Reply-To: <20050615174701.GC31465@ti64.telemetry-investments.com>

On Wed, Jun 15, 2005 at 01:47:01PM -0400, Bill Rugolsky Jr. wrote:
> These days, I'd use TCP.

Agreed.

> Additionally, modern NICs like e1000 support
> TSO (TCP Segmentation Offload), and though TSO has had its share of bugs,
> it is the better path forward.

Please don't tell him about TSO, it doesn't quite work yet ;-)

> > RAID 5, 4k strip size, XFS file system.
>  
> 4K?  That's pretty tiny.

It's extremely small.  We don't use anything less than 64KiB.  Use a
larger stripe size, and tell XFS what stripe size you're using so it
can align IOs correctly: RTFM about the options -d sunit, -d swidth,
-l sunit, and -l version to mkfs.xfs.  Also, make sure you align the
start of the XFS filesystem to a RAID stripe width; this may require
futzing with your volume manager config.

> OTOH, using too large a stripe with NFS over RAID5
> can be no good either, if it results in partial writes that require a
> read/modify/write cycle, so it is perhaps best not to go very large.

It depends on your workload; for pure streaming workloads larger stripe
is generally better up to a point determined by your filesystem, amount
of cache in your RAID controller, and other limitations.  We have
customer sites with 2MiB stripe sizes for local XFS fileystems, (*not*
for NFS service) and it works just fine.  But beware, on an NFS server
it's easier to get into the partial write case than with local IO.

> You might want to compare a local sequential read test with
> 
> 	/sbin/blockdev --setra {...,4096,8192,16384,...} <device>
> 
> Traffic on the linux-lvm list suggests increasing the readahead on the
> logical device, and decreasing it on the underlying physical devices,
> but your mileage may vary.

Agreed, I would try tuning upwards the logical block device's readahead.

> Experience with Ext3 data journaling indicates that dropping expire/writeback
> can help to smooth out I/O:
> 
>    vm.dirty_expire_centisecs = {300-1000}
>    vm.dirty_writeback_centisecs = {50-100}
> 

The performance limitation which is helped by tuning the VM to push
dirty pages earlier is in NFS not the underlying filesystem, so this
technique is useful with XFS too.

> Again, I have no experience with XFS.  Since it only does meta-data journaling,
> (equivalent of Ext3 data=writeback), its performance characteristics are probably
> quite different.

XFS also does a bunch of clever things (which I don't really understand)
to group IO going to disk and to limit metadata traffic for allocation.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

next prev parent reply	other threads:[~2005-06-15 22:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20050610031144.4B9CA12F8C@sc8-sf-spam2.sourceforge.net>
2005-06-14 20:17 ` NFS tuning - high performance throughput M. Todd Smith
2005-06-14 20:41   ` Bill Rugolsky Jr.
2005-06-14 22:49     ` M. Todd Smith
2005-06-15 13:03       ` Roger Heflin
2005-06-15 14:47         ` M. Todd Smith
2005-06-15 15:28           ` Roger Heflin
2005-06-15 19:13             ` Dan Stromberg
2005-06-15 19:52               ` Roger Heflin
2005-06-15 20:11                 ` Dan Stromberg
2005-06-15 20:31                   ` Roger Heflin
2005-06-15 20:33                   ` Chris Penney
2005-06-15 17:47       ` Bill Rugolsky Jr.
2005-06-15 20:33         ` M. Todd Smith
2005-06-15 22:43           ` Bill Rugolsky Jr.
2005-06-15 22:47         ` Greg Banks [this message]
2005-06-14 20:50   ` Bill Rugolsky Jr.
2005-06-14 21:04   ` Chris Penney
2005-06-14 21:06     ` Chris Penney
2005-06-14 21:11   ` Roger Heflin
     [not found] <482A3FA0050D21419C269D13989C611308539C89@lavender-fe.eng.netapp.com>
2005-06-14 20:38 ` M. Todd Smith
2005-06-15  1:56   ` Dan Stromberg
2005-06-14 20:40 Lever, Charles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050615224752.GA18915@sgi.com \
    --to=gnb@sgi.com \
    --cc=brugolsky@telemetry-investments.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=todd@sohovfx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.