From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mirko Benz Subject: Re: RAID 0 over HW RAID Date: Thu, 11 May 2006 15:20:44 +0200 Message-ID: <44633A2C.8010503@web.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Mark Hahn Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids Hello, /sys/block/sdc/queue/max_sectors_kb is 256 for both HW RAID devices. We have tested with larger block sizes (256K, 1MB) with actually provides a bit lower performance. Access is sequentiell. We made some more tests with dd for measuring performance. With two strange issues where I have no explanation for. 1) test:~# dd if=/dev/sdc of=/dev/null bs=128k count=30000 30000+0 records in 30000+0 records out 3932160000 bytes transferred in 11.311464 seconds (347626088 bytes/sec) test:~# dd if=/dev/sdc1 of=/dev/null bs=128k count=30000 30000+0 records in 30000+0 records out 3932160000 bytes transferred in 21.004938 seconds (187201694 bytes/sec) Read performance from the same HW RAID is different for entire device (sdc) compared with partition (sdc1). 2) test:~# dd if=/dev/md0 of=/dev/null bs=128k count=30000 30000+0 records in 30000+0 records out 3932160000 bytes transferred in 9.950705 seconds (395163959 bytes/sec) test:~# dd if=/dev/md0 of=/dev/null bs=128k count=30000 skip=1000 30000+0 records in 30000+0 records out 3932160000 bytes transferred in 6.398646 seconds (614530000 bytes/sec) When skipping some MBytes performance improves significantly and is almost the sum of the two HW RAID controllers. Regards, Mirko Mark Hahn schrieb: >> - 2 RAID controllers: ARECA with 7 SATA disks each (RAID5) >> > > what are the /sys/block settings for the blockdevs these export? > I'm thinking about max*sectors_kb. > > >> - stripe size is always 64k >> >> Measured with IOMETER (MB/s, 64 kb block size with sequential I/O). >> > > I don't see how that could be expected to work well. you're doing > sequential 64K IO from user-space (that is, inherently one at a time), > and those map onto a single chunk via md raid0. (well, if the IOs > are aligned - but in any case you won't be generating 128K IOs which > would be the min expected to really make the raid0 shine.) > > >> one HW RAID controller: >> - R: 360 W: 240 >> two HW RAID controllers: >> - R: 619 W: 480 (one IOMETER worker per device) >> MD0 over two HW RAID controllers: >> - R 367 W: 433 (one IOMETER worker over md device) >> >> Read throughput is similar to a single controller. Any hint how to >> improve that? >> Using a larger block size does not help. >> > > which blocksize are you talking about? larger blocksize at the app > level should help. _smaller_ block/chunk size at the md level. > and of course those both interact with the block size prefered > by the areca. > > >> We are considering using MD to combine HW RAID controllers with battery >> backup support for better data protection. >> > > maybe. all this does is permit the HW controller to reorder transactions, > which is not going to matter much if your loads are, in fact, sequential. > > >> In this scenario md should do >> no write caching. >> > > in my humble understanding, MD doesn't do WC. > > >> Is it possible to use something like O_DIRECT with md? >> > > certainly (exactly O_DIRECT). this is mainly instruction to the > pagecache, not MD. I presume O_DIRECT mainly just follows a write > by a barrier, which MD can respect and pass to the areca driver > (which presumably also respects it, though the point of battery-backed > cache would be to let the barrier complete before the IO...) > > >