RAID 0 over HW RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID 0 over HW RAID
@ 2006-05-10 13:37 Mirko Benz
  2006-05-10 15:40 ` Mark Hahn
  0 siblings, 1 reply; 3+ messages in thread
From: Mirko Benz @ 2006-05-10 13:37 UTC (permalink / raw)
  To: linux-raid

Hello,

We want to combine two HW RAID controllers with RAID 0 using MD and see 
some performance issues.

Setup:
- Linux 2.6.16-64 bit, Dual XEON, 1 GB RAM
- 2 RAID controllers: ARECA with 7 SATA disks each (RAID5)
- SW RAID 0 over the two HW RAID controllers
- stripe size is always 64k

Measured with IOMETER (MB/s, 64 kb block size with sequential I/O).
one HW RAID controller:    
- R: 360 W: 240
two HW RAID controllers:    
- R: 619  W: 480 (one IOMETER worker per device)
MD0 over two  HW RAID controllers:
- R 367 W: 433 (one IOMETER worker over md device)

Read throughput is similar to a single controller. Any hint how to 
improve that?
Using a larger block size does not help.

We are considering using MD to combine HW RAID controllers with battery 
backup support for better data protection. In this scenario md should do 
no write caching. Is it possible to use something like O_DIRECT  with md?

Regards,
Mirko

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RAID 0 over HW RAID
  2006-05-10 13:37 RAID 0 over HW RAID Mirko Benz
@ 2006-05-10 15:40 ` Mark Hahn
  2006-05-11 13:20   ` Mirko Benz
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Hahn @ 2006-05-10 15:40 UTC (permalink / raw)
  To: Mirko Benz; +Cc: linux-raid

> - 2 RAID controllers: ARECA with 7 SATA disks each (RAID5)

what are the /sys/block settings for the blockdevs these export?
I'm thinking about max*sectors_kb.

> - stripe size is always 64k
> 
> Measured with IOMETER (MB/s, 64 kb block size with sequential I/O).

I don't see how that could be expected to work well.  you're doing 
sequential 64K IO from user-space (that is, inherently one at a time),
and those map onto a single chunk via md raid0.  (well, if the IOs
are aligned - but in any case you won't be generating 128K IOs which
would be the min expected to really make the raid0 shine.)

> one HW RAID controller:    
> - R: 360 W: 240
> two HW RAID controllers:    
> - R: 619  W: 480 (one IOMETER worker per device)
> MD0 over two  HW RAID controllers:
> - R 367 W: 433 (one IOMETER worker over md device)
> 
> Read throughput is similar to a single controller. Any hint how to 
> improve that?
> Using a larger block size does not help.

which blocksize are you talking about?  larger blocksize at the app
level should help.  _smaller_ block/chunk size at the md level.
and of course those both interact with the block size prefered 
by the areca.

> We are considering using MD to combine HW RAID controllers with battery 
> backup support for better data protection.

maybe.  all this does is permit the HW controller to reorder transactions,
which is not going to matter much if your loads are, in fact, sequential.

> In this scenario md should do 
> no write caching.

in my humble understanding, MD doesn't do WC.

> Is it possible to use something like O_DIRECT  with md?

certainly (exactly O_DIRECT).  this is mainly instruction to the 
pagecache, not MD.  I presume O_DIRECT mainly just follows a write
by a barrier, which MD can respect and pass to the areca driver
(which presumably also respects it, though the point of battery-backed
cache would be to let the barrier complete before the IO...)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RAID 0 over HW RAID
  2006-05-10 15:40 ` Mark Hahn
@ 2006-05-11 13:20   ` Mirko Benz
  0 siblings, 0 replies; 3+ messages in thread
From: Mirko Benz @ 2006-05-11 13:20 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Hello,

/sys/block/sdc/queue/max_sectors_kb is 256 for both HW RAID devices.

We have tested with larger block sizes (256K, 1MB) with actually 
provides a bit lower performance. Access is sequentiell.

We made some more tests with dd for measuring performance. With two 
strange issues where I have no explanation for.

1)
test:~# dd if=/dev/sdc of=/dev/null bs=128k count=30000
30000+0 records in
30000+0 records out
3932160000 bytes transferred in 11.311464 seconds (347626088 bytes/sec)

test:~# dd if=/dev/sdc1 of=/dev/null bs=128k count=30000
30000+0 records in
30000+0 records out
3932160000 bytes transferred in 21.004938 seconds (187201694 bytes/sec)

Read performance from the same HW RAID is different for entire device 
(sdc) compared with partition (sdc1).

2)
test:~# dd if=/dev/md0 of=/dev/null bs=128k count=30000
30000+0 records in
30000+0 records out
3932160000 bytes transferred in 9.950705 seconds (395163959 bytes/sec)

test:~# dd if=/dev/md0 of=/dev/null bs=128k count=30000 skip=1000
30000+0 records in
30000+0 records out
3932160000 bytes transferred in 6.398646 seconds (614530000 bytes/sec)

When skipping some MBytes performance improves significantly and is 
almost the sum of the two HW RAID controllers.

Regards,
Mirko

Mark Hahn schrieb:
>> - 2 RAID controllers: ARECA with 7 SATA disks each (RAID5)
>>     
>
> what are the /sys/block settings for the blockdevs these export?
> I'm thinking about max*sectors_kb.
>
>   
>> - stripe size is always 64k
>>
>> Measured with IOMETER (MB/s, 64 kb block size with sequential I/O).
>>     
>
> I don't see how that could be expected to work well.  you're doing 
> sequential 64K IO from user-space (that is, inherently one at a time),
> and those map onto a single chunk via md raid0.  (well, if the IOs
> are aligned - but in any case you won't be generating 128K IOs which
> would be the min expected to really make the raid0 shine.)
>
>   
>> one HW RAID controller:    
>> - R: 360 W: 240
>> two HW RAID controllers:    
>> - R: 619  W: 480 (one IOMETER worker per device)
>> MD0 over two  HW RAID controllers:
>> - R 367 W: 433 (one IOMETER worker over md device)
>>
>> Read throughput is similar to a single controller. Any hint how to 
>> improve that?
>> Using a larger block size does not help.
>>     
>
> which blocksize are you talking about?  larger blocksize at the app
> level should help.  _smaller_ block/chunk size at the md level.
> and of course those both interact with the block size prefered 
> by the areca.
>
>   
>> We are considering using MD to combine HW RAID controllers with battery 
>> backup support for better data protection.
>>     
>
> maybe.  all this does is permit the HW controller to reorder transactions,
> which is not going to matter much if your loads are, in fact, sequential.
>
>   
>> In this scenario md should do 
>> no write caching.
>>     
>
> in my humble understanding, MD doesn't do WC.
>
>   
>> Is it possible to use something like O_DIRECT  with md?
>>     
>
> certainly (exactly O_DIRECT).  this is mainly instruction to the 
> pagecache, not MD.  I presume O_DIRECT mainly just follows a write
> by a barrier, which MD can respect and pass to the areca driver
> (which presumably also respects it, though the point of battery-backed
> cache would be to let the barrier complete before the IO...)
>
>
>   


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-05-11 13:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-10 13:37 RAID 0 over HW RAID Mirko Benz
2006-05-10 15:40 ` Mark Hahn
2006-05-11 13:20   ` Mirko Benz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).