public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Poor Software RAID-0 performance with 2.6.14.2
@ 2005-11-21 20:31 Lars Roland
  2005-11-21 20:47 ` Lennart Sorensen
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Lars Roland @ 2005-11-21 20:31 UTC (permalink / raw)
  To: Linux-Kernel

I have created a stripe across two 500Gb disks located on separate IDE
channels using:

mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd

the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
with hdparm and blockdev tuning), both bonnie++ and hdparm (included
below) shows a single disk operating faster than the stripe:

----
dkstorage01:~# hdparm -t /dev/md0
/dev/md0:
 Timing buffered disk reads:  182 MB in  3.01 seconds =  60.47 MB/sec

dkstorage02:~# hdparm -t /dev/hdc1
/dev/hdc1:
Timing buffered disk reads:  184 MB in  3.02 seconds =  60.93 MB/sec
----

I am aware of cpu overhead with software raid but such a degradation
should not be the case with raid 0, especially not when the OS is
located on a separate SCSI disk - the IDE disks should just be ready
to work.

There have been some earlier reporting on this problem but they all
seam to end more and less inconclusive (here is one
http://kerneltrap.org/node/4745). Some people favors switching to
dmraid with device mapper, is this the de facto standard today ?

Examining the setup with mdadm gives:
-------
dkstorage01:~# mdadm -E /dev/hdb
/dev/hdb:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 7edc2c10:6cb402e8:06d9bd91:57b11f01
  Creation Time : Mon Nov 21 19:38:30 2005
     Raid Level : raid0
    Device Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Mon Nov 21 19:38:30 2005
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 9766c3ec - correct
         Events : 0.1

     Chunk Size : 4K

      Number   Major   Minor   RaidDevice State
this     1       3       64        1      active sync   /dev/hdb

   0     0      22       64        0      active sync   /dev/hdd
   1     1       3       64        1      active sync   /dev/hdb
-------

mdadm is v1.12.0.



--
Lars Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:31 Poor Software RAID-0 performance with 2.6.14.2 Lars Roland
@ 2005-11-21 20:47 ` Lennart Sorensen
  2005-11-21 21:58   ` Neil Brown
  2005-11-22  9:46   ` Lars Roland
  2005-11-21 21:56 ` Neil Brown
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Lennart Sorensen @ 2005-11-21 20:47 UTC (permalink / raw)
  To: Lars Roland; +Cc: Linux-Kernel

On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
> 
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd

Does -l0 equal stripe or linear?  The mdadm man page doesn't seem clear
o that to me.

If it defaults to linear, then you shouldn't expect any performance gain
since that would just stick one drive after the other (no striping).
Try explicitly statping -l stripe instead of -l 0.

> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
> 
> ----
> dkstorage01:~# hdparm -t /dev/md0
> /dev/md0:
>  Timing buffered disk reads:  182 MB in  3.01 seconds =  60.47 MB/sec
> 
> dkstorage02:~# hdparm -t /dev/hdc1
> /dev/hdc1:
> Timing buffered disk reads:  184 MB in  3.02 seconds =  60.93 MB/sec

How about at least testing one of the drives involved in the raid,
although I assume they are identical in your case given the numbers.

Did you test this with other kernel versions (older ones) to see if it
was better in the past?

Any idea where the ide controller is connected?  If it is PCI the whole
bus only has 133MB/s to give on many systems (some have more of course),
so maybe 60M/s is quite good.

Len Sorensen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:31 Poor Software RAID-0 performance with 2.6.14.2 Lars Roland
  2005-11-21 20:47 ` Lennart Sorensen
@ 2005-11-21 21:56 ` Neil Brown
  2005-11-22 10:04   ` Lars Roland
  2005-11-22  8:29 ` Helge Hafting
  2005-11-22 17:26 ` Bill Davidsen
  3 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2005-11-21 21:56 UTC (permalink / raw)
  To: Lars Roland; +Cc: Linux-Kernel

On Monday November 21, lroland@gmail.com wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
> 
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> 
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
> 
> ----
> dkstorage01:~# hdparm -t /dev/md0
> /dev/md0:
>  Timing buffered disk reads:  182 MB in  3.01 seconds =  60.47 MB/sec
> 
> dkstorage02:~# hdparm -t /dev/hdc1
> /dev/hdc1:
> Timing buffered disk reads:  184 MB in  3.02 seconds =  60.93 MB/sec
> ----

Could you try hdparm tests on the two drives in parallel?
   hdparm -t /dev/hdb & hdparm -t /dev/hdd

It could be that the controller doesn't handle parallel traffic very
well.


> 
> I am aware of cpu overhead with software raid but such a degradation
> should not be the case with raid 0, especially not when the OS is
> located on a separate SCSI disk - the IDE disks should just be ready
> to work.

raid0 has essentially 0 cpu overhead.  It would be maybe a couple of
hundred instructions which would be lost in the noise.  It just
figures out which drive each request should go to, and directs it
there.


> 
> There have been some earlier reporting on this problem but they all
> seam to end more and less inconclusive (here is one
> http://kerneltrap.org/node/4745). Some people favors switching to
> dmraid with device mapper, is this the de facto standard today ?
> 

The kerneltrap reference is about raid5.
raid5 is implemented very differently to raid0.

It might be worth experimenting with different read-ahead values using
the 'blockdev' command.  Alternately use a larger chunk size.

I don't think there is a de facto standard.  Many people use md.  Many
use dm.  

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:47 ` Lennart Sorensen
@ 2005-11-21 21:58   ` Neil Brown
  2005-11-22  9:46   ` Lars Roland
  1 sibling, 0 replies; 10+ messages in thread
From: Neil Brown @ 2005-11-21 21:58 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Lars Roland, Linux-Kernel

On Monday November 21, lsorense@csclub.uwaterloo.ca wrote:
> On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> > I have created a stripe across two 500Gb disks located on separate IDE
> > channels using:
> > 
> > mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> 
> Does -l0 equal stripe or linear?  The mdadm man page doesn't seem clear
> o that to me.

0 is raid0.  I thought that was so blatantly obvious that it wasn't
worth spelling it out in the man page.  Maybe I was wrong :-(.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:31 Poor Software RAID-0 performance with 2.6.14.2 Lars Roland
  2005-11-21 20:47 ` Lennart Sorensen
  2005-11-21 21:56 ` Neil Brown
@ 2005-11-22  8:29 ` Helge Hafting
  2005-11-22 17:26 ` Bill Davidsen
  3 siblings, 0 replies; 10+ messages in thread
From: Helge Hafting @ 2005-11-22  8:29 UTC (permalink / raw)
  To: Lars Roland; +Cc: Linux-Kernel

On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
> 
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> 
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
> 
To rule out hardware problems (harware not as parallel as you might think):

Try running the performance test (bonnie++ or hdparm)
on both /dev/hdb and /dev/hdd at the same time. 

Two hdparms on different disks should not take longer time than one,
unless you have bad hardware.

One bonnie with size x MB takes y minutes to run.
Two bonnies, each of size x/2 MB should take between
y/2 an y minutes to run. If they need more, then something
is wrong again, explaining bad RAID performance.

Helge Hafting



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:47 ` Lennart Sorensen
  2005-11-21 21:58   ` Neil Brown
@ 2005-11-22  9:46   ` Lars Roland
  1 sibling, 0 replies; 10+ messages in thread
From: Lars Roland @ 2005-11-22  9:46 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Linux-Kernel

On 11/21/05, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:
> > dkstorage01:~# hdparm -t /dev/md0
> > /dev/md0:
> >  Timing buffered disk reads:  182 MB in  3.01 seconds =  60.47 MB/sec
> >
> > dkstorage02:~# hdparm -t /dev/hdc1
> > /dev/hdc1:
> > Timing buffered disk reads:  184 MB in  3.02 seconds =  60.93 MB/sec
>
> How about at least testing one of the drives involved in the raid,
> although I assume they are identical in your case given the numbers.

There are four identical drives in the machines although I only stripe
on two of them - I can assure you that I get the same numbers from all
the drives - I should ofcause have put this info in the orig post.

>
> Did you test this with other kernel versions (older ones) to see if it
> was better in the past?

Also tried 2.4.27 and 2.4.30 - no difference there.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 21:56 ` Neil Brown
@ 2005-11-22 10:04   ` Lars Roland
  0 siblings, 0 replies; 10+ messages in thread
From: Lars Roland @ 2005-11-22 10:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux-Kernel

On 11/21/05, Neil Brown <neilb@suse.de> wrote:
> On Monday November 21, lroland@gmail.com wrote:
> > I have created a stripe across two 500Gb disks located on separate IDE
> > channels using:
> >
> > mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> >
> > the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> > with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> > below) shows a single disk operating faster than the stripe:
> >
> > ----
> > dkstorage01:~# hdparm -t /dev/md0
> > /dev/md0:
> >  Timing buffered disk reads:  182 MB in  3.01 seconds =  60.47 MB/sec
> >
> > dkstorage02:~# hdparm -t /dev/hdc1
> > /dev/hdc1:
> > Timing buffered disk reads:  184 MB in  3.02 seconds =  60.93 MB/sec
> > ----
>
> Could you try hdparm tests on the two drives in parallel?
>    hdparm -t /dev/hdb & hdparm -t /dev/hdd
>
> It could be that the controller doesn't handle parallel traffic very
> well.
>

hmm I should of cause have thought of this earlier - it does indeed
seam that the controller does not handle parallel traffic very well

-----------
dkstorage01:~# hdparm -t /dev/hdb
/dev/hdb:
 Timing buffered disk reads:  112 MB in  3.02 seconds =  37.09 MB/sec

dkstorage01:~# hdparm -t /dev/hdd
/dev/hdd:
 Timing buffered disk reads:  108 MB in  3.02 seconds =  35.76 MB/sec
-----------

Bonnie test shows the same picture.

> raid0 has essentially 0 cpu overhead.  It would be maybe a couple of
> hundred instructions which would be lost in the noise.  It just
> figures out which drive each request should go to, and directs it
> there.

Yeah so it is properly just a poor controller.


--
Lars Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-21 20:31 Poor Software RAID-0 performance with 2.6.14.2 Lars Roland
                   ` (2 preceding siblings ...)
  2005-11-22  8:29 ` Helge Hafting
@ 2005-11-22 17:26 ` Bill Davidsen
  2005-11-22 18:23   ` Paul Clements
  3 siblings, 1 reply; 10+ messages in thread
From: Bill Davidsen @ 2005-11-22 17:26 UTC (permalink / raw)
  To: Lars Roland, Linux Kernel Mailing List, Linux RAID M/L

Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
> 
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> 
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:

In looking at this I found something interesting, even though you 
identified your problem before I was able to use the data for the 
intended purpose. So other than suggesting that the stripe size is too 
small, nothing on that, your hardware is the issue.

I have two ATA drives connected, and each has two partitions. The first 
partition of each is mirrored for reliability with default 64k chunks, 
and the second is striped, with 512k chunks (I write a lot of 100MB 
files to this f/s).

Reading the individual devices with dd, I saw a transfer rate of about 
60MB/s, while the striped md1 device gave just under 120MB/s. (60.3573 
and 119.6458) actually. However, the mirrored md0 also gave just 60MB/s 
read speed.

One of the advantages of mirroring is that if there is heavy read load 
when one drive is busy there is another copy of the data on the other 
drive(s). But doing 1MB reads on the mirrored device did not show that 
the kernel took advantage of this in any way. In fact, it looks as if 
all the reads are going to the first device, even with multiple 
processes running. Does the md code now set "write-mostly" by default 
and only go to the redundant drives if the first fails?

I won't be able to do a lot of testing until Thursday, or perhaps 
Wednesday night, but that is not as I expected and not what I want, I do 
mirroring on web and news servers to spread the head motion, now I will 
be looking at the stats to see if that's happening.

I added the raid M/L to the addresses, since this is getting to be 
general RAID question.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-22 17:26 ` Bill Davidsen
@ 2005-11-22 18:23   ` Paul Clements
  2005-11-22 21:39     ` Bill Davidsen
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Clements @ 2005-11-22 18:23 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Lars Roland, Linux Kernel Mailing List, Linux RAID M/L

Bill Davidsen wrote:

> One of the advantages of mirroring is that if there is heavy read load 
> when one drive is busy there is another copy of the data on the other 
> drive(s). But doing 1MB reads on the mirrored device did not show that 
> the kernel took advantage of this in any way. In fact, it looks as if 
> all the reads are going to the first device, even with multiple 
> processes running. Does the md code now set "write-mostly" by default 
> and only go to the redundant drives if the first fails?

No, it doesn't use write-mostly by default. The way raid1 read balancing 
works (in recent kernels) is this:

- sequential reads continue to go to the first disk

- for non-sequential reads, the code tries to pick the disk whose head 
is "closest" to the sector that needs to be read

So even if the reads aren't exactly sequential, you probably still end 
up reading from the first disk most of the time. I imagine with a more 
random read pattern you'd see the second disk getting used.

--
Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Poor Software RAID-0 performance with 2.6.14.2
  2005-11-22 18:23   ` Paul Clements
@ 2005-11-22 21:39     ` Bill Davidsen
  0 siblings, 0 replies; 10+ messages in thread
From: Bill Davidsen @ 2005-11-22 21:39 UTC (permalink / raw)
  To: Paul Clements; +Cc: Lars Roland, Linux Kernel Mailing List, Linux RAID M/L

Paul Clements wrote:
> Bill Davidsen wrote:
> 
>> One of the advantages of mirroring is that if there is heavy read load 
>> when one drive is busy there is another copy of the data on the other 
>> drive(s). But doing 1MB reads on the mirrored device did not show that 
>> the kernel took advantage of this in any way. In fact, it looks as if 
>> all the reads are going to the first device, even with multiple 
>> processes running. Does the md code now set "write-mostly" by default 
>> and only go to the redundant drives if the first fails?
> 
> 
> No, it doesn't use write-mostly by default. The way raid1 read balancing 
> works (in recent kernels) is this:
> 
> - sequential reads continue to go to the first disk
> 
> - for non-sequential reads, the code tries to pick the disk whose head 
> is "closest" to the sector that needs to be read
> 
> So even if the reads aren't exactly sequential, you probably still end 
> up reading from the first disk most of the time. I imagine with a more 
> random read pattern you'd see the second disk getting used.

Thanks for the clarification. I think the current method is best for 
most cases, I have to think about how large a file you would need to 
have any saving in transfer time given that you have to consider the 
slowest seek, drives doing other things on a busy system, etc.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-11-23 15:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-21 20:31 Poor Software RAID-0 performance with 2.6.14.2 Lars Roland
2005-11-21 20:47 ` Lennart Sorensen
2005-11-21 21:58   ` Neil Brown
2005-11-22  9:46   ` Lars Roland
2005-11-21 21:56 ` Neil Brown
2005-11-22 10:04   ` Lars Roland
2005-11-22  8:29 ` Helge Hafting
2005-11-22 17:26 ` Bill Davidsen
2005-11-22 18:23   ` Paul Clements
2005-11-22 21:39     ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox