Re: splice/vmsplice performance test results

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: Jim Schutt <jaschut@sandia.gov>
Cc: linux-kernel@vger.kernel.org
Subject: Re: splice/vmsplice performance test results
Date: Thu, 16 Nov 2006 21:25:29 +0100	[thread overview]
Message-ID: <20061116202529.GH7164@kernel.dk> (raw)
In-Reply-To: <1163700539.2672.14.camel@sale659.sandia.gov>

On Thu, Nov 16 2006, Jim Schutt wrote:
> Hi,
> 
> I've done some testing to see how splice/vmsplice perform
> vs. other alternatives on transferring a large file across
> a fast network.  One option I tested was to use vmsplice 
> to get a 1-copy receive, but it didn't perform as well 
> as I had hoped.  I was wondering if my results were at odds
> with what other people have observed.
> 
> I've two systems, each with:
>   Tyan S2895 motherboard
>   2 ea. 2.6 GHz Opteron
>   1 GiB memory
>   Myricom Myri-10G 10 Gb/s NIC (PCIe x8)
>   2.6.19-rc5-g134a11f0 on FC4
> 
> In addition, one system has a 3ware 9590-8ML (PCIe) and a 3ware
> 9550SX-8LP (PCI-X), with 16 Seagate Barracuda 7200.10 SATA drives 
> (250 GB ea., NCQ enabled).  Write caching is enabled on the 3ware
> cards.
> 
> The Myricom cards are connected back-to-back using 9000 byte MTU. 
> I baseline the network performance with 'iperf -w 1M -l 64K'
> and get 6.9 Gb/s.
> 
> After a fair amount of testing, I settled on a 4-way software
> RAID0 on top of 4-way hardware RAID0 units as giving the best
> streaming performance.  The file system is XFS, with the stripe
> unit set to the hardware RAID chunk size, and the stripe width 
> 16 times that.  
> 
> Disk tuning parameters in /sys/block/sd*/queue are default
> values, except queue/nr_requests = 5 gives me best performance.
> (It seems like the 3ware cards slow down a little if I feed them 
> too much data on the streaming write test I'm using.)
> 
> I baseline file write performance with 
>   sync; time { dd if=/dev/zero of=./zero bs=32k count=512k; sync; }
> and get 465-520 MB/s (highly variable).
> 
> I test baseline file read performance with
>   time dd if=./zero of=/dev/null bs=32k count=512k
> and get 950 MB/s (fairly repeatable).
> 
> My test program can do one of the following:
> 
> send data:
>  A) read() from file into buffer, write() buffer into socket
>  B) mmap() section of file, write() that into socket, munmap()
>  C) splice() from file to pipe, splice() from pipe to socket
> 
> receive data:
>  1) read() from socket into buffer, write() buffer into file
>  2) ftruncate() to extend file, mmap() new extent, read() 
>       from socket into new extent, munmap()
>  3) read() from socket into buffer, vmsplice() buffer to 
>      pipe, splice() pipe to file (using the double-buffer trick)
> 
> Here's the results, using:
>  - 64 KiB buffer, mmap extent, or splice
>  - 1 MiB TCP window
>  - 16 GiB data sent across network
> 
> A) from /dev/zero -> 1) to /dev/null : 857 MB/s (6.86 Gb/s)
> 
> A) from file      -> 1) to /dev/null : 472 MB/s (3.77 Gb/s)
> B) from file      -> 1) to /dev/null : 366 MB/s (2.93 Gb/s)
> C) from file      -> 1) to /dev/null : 854 MB/s (6.83 Gb/s)
> 
> A) from /dev/zero -> 1) to file      : 375 MB/s (3.00 Gb/s)
> A) from /dev/zero -> 2) to file      : 150 MB/s (1.20 Gb/s)
> A) from /dev/zero -> 3) to file      : 286 MB/s (2.29 Gb/s)
> 
> I had (naively) hoped the read/vmsplice/splice combination would 
> run at the same speed I can write a file, i.e. at about 450 MB/s
> on my setup.  Do any of my numbers seem bogus, so I should look 
> harder at my test program?

Could be read-ahead playing in here, I'd have to take a closer look at
the generated io patterns to say more about that. Any chance you can
capture iostat or blktrace info for such a run to compare that goes to
the disk? Can you pass along the test program?

> Or is read+write really the fastest way to get data off a
> socket and into a file?

splice() should be just as fast of course, and more efficient. Not a lot
of real-life performance tuning has gone into it yet, so I would not be
surprised if we need to smoothen a few edges.

-- 
Jens Axboe

next prev parent reply	other threads:[~2006-11-16 20:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-16 18:08 splice/vmsplice performance test results Jim Schutt
2006-11-16 20:25 ` Jens Axboe [this message]
2006-11-16 21:24   ` Jim Schutt
2006-11-17 17:21   ` Jim Schutt
2006-11-20  7:59     ` Jens Axboe
2006-11-20  8:24       ` Jens Axboe
2006-11-20 15:49         ` Jim Schutt
2006-11-21 13:54           ` Jens Axboe
2006-11-21 19:17             ` Jim Schutt
2006-11-22  8:57               ` Jens Axboe
2006-11-22 22:35                 ` Jim Schutt
2006-11-23 11:24                   ` Jens Axboe
2006-11-27 20:57                     ` Jim Schutt
2006-11-16 20:52 ` David Miller
2006-11-16 21:21   ` Jens Axboe
2006-11-16 21:27     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061116202529.GH7164@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=jaschut@sandia.gov \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox