linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Random IO with md raid
@ 2009-12-04 22:31 Matthieu Patou
  2009-12-05 14:20 ` Asdo
  0 siblings, 1 reply; 7+ messages in thread
From: Matthieu Patou @ 2009-12-04 22:31 UTC (permalink / raw)
  To: linux-raid

Hello,

I'm reviewing one new server with a 3ware 9690 controller where I've got 
a couple of sata drives connected.

In order to see how software raid behave in comparaison with a the raid 
controller I setup 3 volume:

* 1 raid 1 volume of 2 1TB hard drive
* 2 single volume of 1 TB hard drive each

I assemble the 2 single volume in a raid 1 volume

Both are formatted in XFS mounted noatime and nobarrier.

Sequentials reads and writes are quite similar between software raid and 
3ware raid.

But when it comes to random writes and random reads I find something 
strange.

The memory is 1G so I made iozone use 2G file with 4k block

On 3ware (in operation by second)
Random read:  193
Random write: 1232

On md
Random read:  178
Random write: 361

First I am bit worried that I can manage to get such good random result 
with the 3ware card I guess I am seeing some cache effect (there is 256 
MB on the card)
Also I'm surprised that the software raid is not able to take advantage 
of this cache effect in the same proportion (1 to 4).

I've been searching internet for tips on raid tunning but it turns out 
it's mostly for sequential access (ie. like readahead).

Does anyone has an any idea ?

Matthieu.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-04 22:31 Random IO with md raid Matthieu Patou
@ 2009-12-05 14:20 ` Asdo
  2009-12-07 22:59   ` Matthieu Patou
  0 siblings, 1 reply; 7+ messages in thread
From: Asdo @ 2009-12-05 14:20 UTC (permalink / raw)
  To: Matthieu Patou; +Cc: linux-raid

Matthieu Patou wrote:
> * 1 raid 1 volume of 2 1TB hard drive
Are these drives also on the 3ware? Are they exported as jbod?
> * 2 single volume of 1 TB hard drive each
> ...
> Does anyone has an any idea ?

Try anticipatory, deadline and noop schedulers on disks (not CFQ)
Try setting readahead to exactly 4K, or set it to lowest possible value 
(I'm not sure what is best), since this is random access...
Increase stripe_cache_size as high as you can, p
Let us/me know the results afterwards...
Thank you
Asdo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-05 14:20 ` Asdo
@ 2009-12-07 22:59   ` Matthieu Patou
  2009-12-09 16:36     ` Goswin von Brederlow
  0 siblings, 1 reply; 7+ messages in thread
From: Matthieu Patou @ 2009-12-07 22:59 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

On 05/12/2009 17:20, Asdo wrote:
> Matthieu Patou wrote:
>> * 1 raid 1 volume of 2 1TB hard drive
> Are these drives also on the 3ware?
I suppose that you are speaking of the 2 single drives. If so yes they 
are also connected to the 3ware controller, they are not exported as 
jbod but as seperate hard drives.
Are they exported as jbod?
>> * 2 single volume of 1 TB hard drive each
>> ...
>> Does anyone has an any idea ?
>
> Try anticipatory, deadline and noop schedulers on disks (not CFQ)
> Try setting readahead to exactly 4K, or set it to lowest possible value
I tried all the different scheduler both with 4K and with the default 
value and I still have raid1 software that do 1/2 or 1/3 of the raid1 
hardware for random write I/O.
> (I'm not sure what is best), since this is random access...
> Increase stripe_cache_size as high as you can, p
So for raid1 there is no stripe_cache_size so I didn't set it ...
> Let us/me know the results afterwards...

I also made a simple tests with the CFQ scheduler without any other 
tunning but turning of the controller cache and at this moment the 
results of the RAID1 hardware and the RAID1 software are very close.
Looks like linux is not able to use the onboard cache of controller as 
efficiently as the 3ware controller it self at least for random IO.

Any other ideas ?

Matthieu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-07 22:59   ` Matthieu Patou
@ 2009-12-09 16:36     ` Goswin von Brederlow
  2009-12-14  9:06       ` Matthieu Patou
  0 siblings, 1 reply; 7+ messages in thread
From: Goswin von Brederlow @ 2009-12-09 16:36 UTC (permalink / raw)
  To: Matthieu Patou; +Cc: Asdo, linux-raid

Matthieu Patou <mat@matws.net> writes:

> On 05/12/2009 17:20, Asdo wrote:
>> Matthieu Patou wrote:
>>> * 1 raid 1 volume of 2 1TB hard drive
>> Are these drives also on the 3ware?
> I suppose that you are speaking of the 2 single drives. If so yes they
> are also connected to the 3ware controller, they are not exported as
> jbod but as seperate hard drives.
> Are they exported as jbod?
>>> * 2 single volume of 1 TB hard drive each
>>> ...
>>> Does anyone has an any idea ?
>>
>> Try anticipatory, deadline and noop schedulers on disks (not CFQ)
>> Try setting readahead to exactly 4K, or set it to lowest possible value
> I tried all the different scheduler both with 4K and with the default
> value and I still have raid1 software that do 1/2 or 1/3 of the raid1
> hardware for random write I/O.
>> (I'm not sure what is best), since this is random access...
>> Increase stripe_cache_size as high as you can, p
> So for raid1 there is no stripe_cache_size so I didn't set it ...
>> Let us/me know the results afterwards...
>
> I also made a simple tests with the CFQ scheduler without any other
> tunning but turning of the controller cache and at this moment the
> results of the RAID1 hardware and the RAID1 software are very close.
> Looks like linux is not able to use the onboard cache of controller as
> efficiently as the 3ware controller it self at least for random IO.
>
> Any other ideas ?
>
> Matthieu

Software raid1 will write twice as much data. That means twice as much
data goes over the system bus and into the controler cache.
Effectively you have halve the cache size. Maybe that is all you see.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-09 16:36     ` Goswin von Brederlow
@ 2009-12-14  9:06       ` Matthieu Patou
  2009-12-14 10:13         ` Michael Evans
  0 siblings, 1 reply; 7+ messages in thread
From: Matthieu Patou @ 2009-12-14  9:06 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Asdo, linux-raid

Hello Goswin,
 > Software raid1 will write twice as much data. That means twice as much
 > data goes over the system bus and into the controler cache.
 > Effectively you have halve the cache size. Maybe that is all you see.
 >
 > MfG
 >          Goswin
 >
Your idea seems logical, and I took a few hours today to verify it and 
it's the case as accessing the disk without software raid leads to 
almost the same result as with hardware raid.

Matthieu.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-14  9:06       ` Matthieu Patou
@ 2009-12-14 10:13         ` Michael Evans
  2009-12-16 11:32           ` Goswin von Brederlow
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Evans @ 2009-12-14 10:13 UTC (permalink / raw)
  To: Matthieu Patou; +Cc: Goswin von Brederlow, Asdo, linux-raid

On Mon, Dec 14, 2009 at 1:06 AM, Matthieu Patou <mat@matws.net> wrote:
> Hello Goswin,
>> Software raid1 will write twice as much data. That means twice as much
>> data goes over the system bus and into the controler cache.
>> Effectively you have halve the cache size. Maybe that is all you see.
>>
>> MfG
>>          Goswin
>>
> Your idea seems logical, and I took a few hours today to verify it and it's
> the case as accessing the disk without software raid leads to almost the
> same result as with hardware raid.
>
> Matthieu.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Thinking about it, the doubling of data going over the system bus part
is correct.  You must still push the operation out to each device's
buffer.  However for -most- modern systems the system-bus will not be
the bottleneck for a reasonable number of drives.

Further the later half which I'd skimmed over the first time is
utterly incorrect.  Each drive would still only see the commands
targeting that drive.  That should be virtually the same if not
identical to the single-drive case.

The two most likely bottlenecks are single-drive write speed, and any
IO barriers that are used to ensure file system consistency in the
event of sudden interruption.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Random IO with md raid
  2009-12-14 10:13         ` Michael Evans
@ 2009-12-16 11:32           ` Goswin von Brederlow
  0 siblings, 0 replies; 7+ messages in thread
From: Goswin von Brederlow @ 2009-12-16 11:32 UTC (permalink / raw)
  To: Michael Evans; +Cc: Matthieu Patou, Goswin von Brederlow, Asdo, linux-raid

Michael Evans <mjevans1983@gmail.com> writes:

> On Mon, Dec 14, 2009 at 1:06 AM, Matthieu Patou <mat@matws.net> wrote:
>> Hello Goswin,
>>> Software raid1 will write twice as much data. That means twice as much
>>> data goes over the system bus and into the controler cache.
>>> Effectively you have halve the cache size. Maybe that is all you see.
>>>
>>> MfG
>>>          Goswin
>>>
>> Your idea seems logical, and I took a few hours today to verify it and it's
>> the case as accessing the disk without software raid leads to almost the
>> same result as with hardware raid.
>>
>> Matthieu.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Thinking about it, the doubling of data going over the system bus part
> is correct.  You must still push the operation out to each device's
> buffer.  However for -most- modern systems the system-bus will not be
> the bottleneck for a reasonable number of drives.
>
> Further the later half which I'd skimmed over the first time is
> utterly incorrect.  Each drive would still only see the commands
> targeting that drive.  That should be virtually the same if not
> identical to the single-drive case.

The controler itself has cache. And that is shared between all drives.

> The two most likely bottlenecks are single-drive write speed, and any
> IO barriers that are used to ensure file system consistency in the
> event of sudden interruption.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-12-16 11:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-04 22:31 Random IO with md raid Matthieu Patou
2009-12-05 14:20 ` Asdo
2009-12-07 22:59   ` Matthieu Patou
2009-12-09 16:36     ` Goswin von Brederlow
2009-12-14  9:06       ` Matthieu Patou
2009-12-14 10:13         ` Michael Evans
2009-12-16 11:32           ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).