From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Wed, 26 Sep 2007 10:49:40 -0700 (PDT) Received: from web32905.mail.mud.yahoo.com (web32905.mail.mud.yahoo.com [209.191.69.82]) by oss.sgi.com (8.12.11.20060308/8.12.10/SuSE Linux 0.7) with SMTP id l8QHnWZP004200 for ; Wed, 26 Sep 2007 10:49:36 -0700 Date: Wed, 26 Sep 2007 10:49:31 -0700 (PDT) From: "Bryan J. Smith" Reply-To: b.j.smith@ieee.org Subject: Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <730488.2449.qm@web32905.mail.mud.yahoo.com> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Justin Piszcz , "Bryan J. Smith" Cc: Ralf Gross , linux-xfs@oss.sgi.com Justin Piszcz wrote: > Do you have any type of benchmarks to similate the load you are > mentioning? Yes, write different, non-zero, 100GB data files from 30 NFSv3 sync clients at the same time. You can easily script firing that off and get the number of seconds it takes to commit. Use NFS with UDP to avoid the overhead of TCP. > What did HW RAID drop to when the same test was run with SW > RAID / 50 MBps under load? I saw an aggregate commit average of around 150MBps using a pair of 8-channel 3Ware Escalade 9550SX cards (each on their own PCI-X bus), with a LVM stripe between them. Understand the test literally took 5 hours to run! The software RAID-50, two "dumb" SATA 8-channel Marvell cards (each on their own PCI-X bus), with a LVM stripe between them, was not completed after 15 hours (overnight). So I finally terminated it. Each system had a 4x GbE trunk to a layer-3 switch. I would have run the same test with SMB TCP/IP, possibly with a LeWiz 4x GbE RX TOE HBA, except I honestly didn't have the time to wait on it. > Did it achieve better performance due to an on-board / > raid-card controller cache, or? Has nothing to do with cache. The OS is far better at scheduling and buffering in the system RAM, in addition to the fact that it does an async buffer, whereas many HW RAID drivers are sync to the NVRAM of the HW RAID card (that's part of the problem with comparisons). It has to do with the fact in software RAID-5 you are streaming 100% of the data through the general system interconnect for the LOAD-XOR-STO operation. XORs are extrmely fast. LOAD/STO through a general purpose CPU is not. It's the same reason why we don't use general purpose CPUs for layer-3 switches either, but a "core" CPU with NPE (network processor engine) ASICs. Same deal with most HW RAID cards, a "core" CPU with SPE ASICs -- for off-load from the general CPU system interconnect. XORs are done "in-line" with the transfer, instead of hogging up the system interconnect. It's the direct difference between PIO and DMA. An in-line NPE/SPE ASIC basically acts like a DMA transfer, real-time. A general purpose CPU and its interconnect cannot do that, so it has all the issues of PIO. PIO in a general purpose CPU is to be avoided at all costs when you have other needs for the system interconnect, like I/O. If you don't have much else bothering the I/O, like in a web server or read-only system (where you're not doing the writes), then it doesn't matter, and software RAID-5 is great. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution