md faster than h/w?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md faster than h/w?
@ 2006-01-13  7:06 Max Waterman
  2006-01-13 14:46 ` Ross Vandegrift
  2006-01-14  6:40 ` Mark Hahn
  0 siblings, 2 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-13  7:06 UTC (permalink / raw)
  To: linux-raid

Hi,

I've been trying to increase the i/o performance of a new server.

The server is a Dell PowerEdge 2850. It has 2(x2) Intel Xeon 3GHz CPUs, 
  4GB RAM, a Perc4/DC RAID controller (AKA MegaRAID SCSI 320-2), and we 
have 5 Fujitsu MAX3073NC drives attached to one of it's channels (can't 
use the other channel due to a missing 'option').

According to Fujitsu's web site, the disks can each do internal IO at 
upto 147MB/s, and burst upto 320MB/s. According to the LSI Logic web 
page, the controller can do upto 320MB/s. All theoretical numbers, of 
course.

So, we're trying to measure the performance. We've been using 'bonnie++' 
and 'hdparm -t'.

We're primarily focusing on read performance at the moment.

We set up the os (debian) on one of the disks (sda), and we're playing 
around with the others in various configurations.

I figured it'd be good to measure the maximum performance of the array, 
so we have been working with the 4 disks in a raid0 configuration 
(/dev/sdb).

Initially, we were getting 'hdparm -t' numbers around 80MB/s, but this 
was when we were testing /dev/sdb1 - the (only) partition on the device. 
When we started testing /dev/sdb, it increased significantly to around 
180MB/s. I'm not sure what to conclude from this.

In any case, our bonnie++ results weren't so high, at around 100MB/s.

Using theoretical numbers as a maximum, we should be able to read at the 
greater of 4 times a single drive speed (=588MB/s) or the SCSI bus speed 
(320MB/s) ie 320MB/s.

So, 100MB/s seems like a poor result.

I thought I'd try one other thing and that was to configure the drives 
as JBOD (which is actually having each one as RAID0 in the controller 
config s/w), and configure as s/w raid0.

Doing this initially resulted in a doubling of bonnie++ speed at over 
200MB/s, though I have been unable to reproduce this result - the most 
common result is still about 180MB/s.

One further strangeness is that our best results have been while using a 
uni-processor kernel - 2.6.8. We would prefer it if our best results 
were with the most recent kernel we have, which is 2.6.15, but no.

So, any advice on how to obtain best performance (mainly web and mail 
server stuff)?
Is 180MB/s-200MB/s a reasonable number for this h/w?
What numbers do other people see on their raid0 h/w?
Any other advice/comments?

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-13  7:06 md faster than h/w? Max Waterman
@ 2006-01-13 14:46 ` Ross Vandegrift
  2006-01-13 21:08   ` Lajber Zoltan
                     ` (2 more replies)
  2006-01-14  6:40 ` Mark Hahn
  1 sibling, 3 replies; 22+ messages in thread
From: Ross Vandegrift @ 2006-01-13 14:46 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid

On Fri, Jan 13, 2006 at 03:06:54PM +0800, Max Waterman wrote:
> One further strangeness is that our best results have been while using a 
> uni-processor kernel - 2.6.8. We would prefer it if our best results 
> were with the most recent kernel we have, which is 2.6.15, but no.

Sounds like this is probably a bug.  If you have some time to play
around with it, I'd try kernels in between and find out exactly where
the regression happened.  The bug will probably be cleaned up quickly
and performance will be back where it should be.

> So, any advice on how to obtain best performance (mainly web and mail 
> server stuff)?
> Is 180MB/s-200MB/s a reasonable number for this h/w?
> What numbers do other people see on their raid0 h/w?
> Any other advice/comments?

My employer usues the 1850 more than the 2850, though we do have a few
in production.  My feeling is that 180-200MB/sec is really excellent
throughput.

We're comparing apples to oranges, but it'll at least give you an
idea.  The Dell 1850s are sortof our highest class of machine that we
commonly deploy.    We have a Supermicro chassis that's exactly like
the 1850 but SATA instead of SCSI.  On the low-end, we have various P4
Prescott chassis.

Just yesterday I was testing disk performance on a low-end box.  SATA
on a 3Ware controller, RAID1.  I was quite pleased to be getting
70-80MB/sec.

So my feeling is that your numbers are fairly close to where they
should be.  Faster procs, SCSI, and a better RAID card.  However, I'd
also try RAID1 if you're mostly interested in read speed.  Remember
that RAID1 lets you balance reads across disks, whereas RAID0 will
require each disk in the array to retrieve the data.

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-13 14:46 ` Ross Vandegrift
@ 2006-01-13 21:08   ` Lajber Zoltan
  2006-01-14  1:19   ` Max Waterman
  2006-01-14  1:22   ` Max Waterman
  2 siblings, 0 replies; 22+ messages in thread
From: Lajber Zoltan @ 2006-01-13 21:08 UTC (permalink / raw)
  Cc: linux-raid

Hi,

On Fri, 13 Jan 2006, Ross Vandegrift wrote:

> > Is 180MB/s-200MB/s a reasonable number for this h/w?
> > What numbers do other people see on their raid0 h/w?
> > Any other advice/comments?

> So my feeling is that your numbers are fairly close to where they
> should be.  Faster procs, SCSI, and a better RAID card.  However, I'd
> also try RAID1 if you're mostly interested in read speed.  Remember

Qlogic state max 190MB/s for their pci-x FC HBA, which came from 2Gbps FC
and pci-x.

Bye,
-=Lajbi=----------------------------------------------------------------
 LAJBER Zoltan               Szent Istvan Egyetem,  Informatika Hivatal
 Most of the time, if you think you are in trouble, crank that throttle!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-13 14:46 ` Ross Vandegrift
  2006-01-13 21:08   ` Lajber Zoltan
@ 2006-01-14  1:19   ` Max Waterman
  2006-01-14  2:05     ` Ross Vandegrift
  2006-01-14  1:22   ` Max Waterman
  2 siblings, 1 reply; 22+ messages in thread
From: Max Waterman @ 2006-01-14  1:19 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: linux-raid

Ross Vandegrift wrote:
> On Fri, Jan 13, 2006 at 03:06:54PM +0800, Max Waterman wrote:
>> One further strangeness is that our best results have been while using a 
>> uni-processor kernel - 2.6.8. We would prefer it if our best results 
>> were with the most recent kernel we have, which is 2.6.15, but no.
> 
> Sounds like this is probably a bug.  If you have some time to play
> around with it, I'd try kernels in between and find out exactly where
> the regression happened.  The bug will probably be cleaned up quickly
> and performance will be back where it should be.
> 
>> So, any advice on how to obtain best performance (mainly web and mail 
>> server stuff)?
>> Is 180MB/s-200MB/s a reasonable number for this h/w?
>> What numbers do other people see on their raid0 h/w?
>> Any other advice/comments?
> 
> My employer usues the 1850 more than the 2850, though we do have a few
> in production.  My feeling is that 180-200MB/sec is really excellent
> throughput.
> 
> We're comparing apples to oranges, but it'll at least give you an
> idea.  The Dell 1850s are sortof our highest class of machine that we
> commonly deploy.    We have a Supermicro chassis that's exactly like
> the 1850 but SATA instead of SCSI.  On the low-end, we have various P4
> Prescott chassis.
> 
> Just yesterday I was testing disk performance on a low-end box.  SATA
> on a 3Ware controller, RAID1.  I was quite pleased to be getting
> 70-80MB/sec.
> 
> So my feeling is that your numbers are fairly close to where they
> should be.  Faster procs, SCSI, and a better RAID card.  However, I'd
> also try RAID1 if you're mostly interested in read speed.  Remember
> that RAID1 lets you balance reads across disks, whereas RAID0 will
> require each disk in the array to retrieve the data.
> 

OK, this sounds good.

I still wonder where all the theoretical numbers went though.

The scsi channel should be able to handle 320MB/s, and we should have 
enough disks to push that (each disk is 147-320MB/s and we have 4 of 
them) - theoretically.

Why does the bandwidth seem to plateau with two disks - adding more into 
the raid0 doesn't seem to improve performance at all?

Why do I get better numbers using the file for the while device (is 
there a better name for it), rather than for a partition (ie /dev/sdb is 
faster than /dev/sdb1 - by a lot)?

Can you explain why raid1 would be faster than raid0? I don't see why 
that would be...

Things I have to try from your email so far are :

1) raid1 - s/w and h/w (we don't care much about capacity, so it's ok)
2) raid0 - h/w, with bonnie++ using no partition table
3) kernels in between 2.6.8 and 2.6.15

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  1:19   ` Max Waterman
@ 2006-01-14  2:05     ` Ross Vandegrift
  2006-01-14  8:26       ` Max Waterman
  0 siblings, 1 reply; 22+ messages in thread
From: Ross Vandegrift @ 2006-01-14  2:05 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid

On Sat, Jan 14, 2006 at 09:19:41AM +0800, Max Waterman wrote:
> I still wonder where all the theoretical numbers went though.
> 
> The scsi channel should be able to handle 320MB/s, and we should have 
> enough disks to push that (each disk is 147-320MB/s and we have 4 of 
> them) - theoretically.

LOL, they went where all theoretical performance numbers go.
Whereever that is, and, lemme tell you it's not anywhere near me ::-).

While your disks claim 147-320MB/sec I'd bet a whole lot they aren't
breaking 100MB/sec.  I don't think I've ever seen a single disk
beat 80-90MB/sec of raw throughput.  The maximum read throughput
listed on storagereview.com is 97.4MB/sec:
http://www.storagereview.com/php/benchmark/bench_sort.php

On top of that, disk seeks are going to make that go way down.
80MB/sec was on an extended read.  Seeking around costs time, which
affects your throughput.

> Why does the bandwidth seem to plateau with two disks - adding more into 
> the raid0 doesn't seem to improve performance at all?

Lets say you read an 8MB file off a disk that runs at 40MB/sec.  That
means it takes 0.2 seconds to stream that data.  If you stripe that
disk, and in theory double read performance, you'll complete in 0.1
seconds instead.

But if you read 8GB, that'll take you about 200 seconds.  Stripe it,
and in theory you're down to 100 seconds.  Throw a third disk, you've
dropped it to 66 seconds - a smaller payoff than the first disk.  If
you add a fourth, you can in theory read it in 50 seconds.

So the second disk you added cut 100 seconds off the read time, but
the fourth only cut off 16.  If we go back to back to the 8MB case,
your second disk saved 0.1 seconds.  If you added a third, it saved
0.04 seconds.

This is probably what you're seeing.  And I'll bet you're close to the
8MB end of the scale than the 8GB end.

> Why do I get better numbers using the file for the while device (is 
> there a better name for it), rather than for a partition (ie /dev/sdb is 
> faster than /dev/sdb1 - by a lot)?

That's a bit weird and I don't have a good explanation.  I'd go to
linux-kernel@vger.kernel.org with that information, some test cases,
and I'll bet it's a bug.

Was this true across kernel versions?

> Can you explain why raid1 would be faster than raid0? I don't see why 
> that would be...

Though reading is the same in theory, I like RAID1 better ::-).  If I
were you, I'd test all applicable configurations.  But of course we
haven't even gotten into write speed...

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  2:05     ` Ross Vandegrift
@ 2006-01-14  8:26       ` Max Waterman
  2006-01-14 10:42         ` Michael Tokarev
  2006-01-14 18:14         ` Mark Hahn
  0 siblings, 2 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-14  8:26 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: linux-raid

Ross Vandegrift wrote:
> On Sat, Jan 14, 2006 at 09:19:41AM +0800, Max Waterman wrote:
>> I still wonder where all the theoretical numbers went though.
>>
>> The scsi channel should be able to handle 320MB/s, and we should have 
>> enough disks to push that (each disk is 147-320MB/s and we have 4 of 
>> them) - theoretically.
> 
> LOL, they went where all theoretical performance numbers go.
> Whereever that is, and, lemme tell you it's not anywhere near me ::-).

:D Good to know, at least :)

> 
> While your disks claim 147-320MB/sec I'd bet a whole lot they aren't
> breaking 100MB/sec.  I don't think I've ever seen a single disk
> beat 80-90MB/sec of raw throughput.

That's about what I'm getting for a single disk.

> The maximum read throughput
> listed on storagereview.com is 97.4MB/sec:
> http://www.storagereview.com/php/benchmark/bench_sort.php

Ah, a good resource, thanks :)

> On top of that, disk seeks are going to make that go way down.
> 80MB/sec was on an extended read.  Seeking around costs time, which
> affects your throughput.

Indeed. Looking primarily at the 'Sequential Input/Block', this is the best output I've had from bonnie++ :

+------------------------------------------------------------------------+
|          |Sequential Output             |Sequential Input    |         |
|----------+------------------------------+--------------------|Random   |
|Size:Chunk|Per Char |Block     |Rewrite  |Per Char |Block     |Seeks    |
|Size      |         |          |         |         |          |         |
|----------+---------+----------+---------+---------+----------+---------|
|          |K/sec|%  |K/sec |%  |K/sec|%  |K/sec|%  |K/sec |%  |/ sec|%  |
|          |     |CPU|      |CPU|     |CPU|     |CPU|      |CPU|     |CPU|
|----------+-----+---+------+---+-----+---+-----+---+------+---+-----+---|
|2G        |48024|96 |121412|13 |59714|10 |47844|95 |200264|21 |942.8|1  |
+------------------------------------------------------------------------+

Anything interesting in those numbers?

> 
>> Why does the bandwidth seem to plateau with two disks - adding more into 
>> the raid0 doesn't seem to improve performance at all?
> 
> Lets say you read an 8MB file off a disk that runs at 40MB/sec.  That
> means it takes 0.2 seconds to stream that data.  If you stripe that
> disk, and in theory double read performance, you'll complete in 0.1
> seconds instead.
> 
> But if you read 8GB, that'll take you about 200 seconds.  Stripe it,
> and in theory you're down to 100 seconds.  Throw a third disk, you've
> dropped it to 66 seconds - a smaller payoff than the first disk.  If
> you add a fourth, you can in theory read it in 50 seconds.
> 
> So the second disk you added cut 100 seconds off the read time, but
> the fourth only cut off 16.  If we go back to back to the 8MB case,
> your second disk saved 0.1 seconds.  If you added a third, it saved
> 0.04 seconds.

OK. All makes sense. However, the 'hdparm -t' numbers (didn't try bonnie++)
did seem to actually go down (slightly - eg 170MB/s to 160MB/s) when I added
the 3rd disk.

> 
> This is probably what you're seeing.  And I'll bet you're close to the
> 8MB end of the scale than the 8GB end.

Well, with bonnie++, it said it was using a 'size' of either 2G (2.6.8) or
7G (2.6.15-smp). I'm not sure why it picked a different size...

>> Why do I get better numbers using the file for the while device (is 
>> there a better name for it), rather than for a partition (ie /dev/sdb is 
>> faster than /dev/sdb1 - by a lot)?
> 
> That's a bit weird and I don't have a good explanation.  I'd go to
> linux-kernel@vger.kernel.org with that information, some test cases,
> and I'll bet it's a bug.

OK, I'll take the referral for that, thanks :D

> 
> Was this true across kernel versions?
> 
>> Can you explain why raid1 would be faster than raid0? I don't see why 
>> that would be...
> 
> Though reading is the same in theory, I like RAID1 better ::-).  If I
> were you, I'd test all applicable configurations.  But of course we
> haven't even gotten into write speed...

My preference will probably be raid10 - ie raid0 2 drives, raid0
another 2 drives, and then raid1 both raid0s. My 5th disk can be a hot
spare. Round reasonable?

Alternatively, we could probably get a 6th disk and do raid1 on
disk #5 & #6 and install the OS on that - keeping the application
data separate. This would be ideal, I think. For some reason, I like
to keep os separate from application data.

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  8:26       ` Max Waterman
@ 2006-01-14 10:42         ` Michael Tokarev
  2006-01-14 11:48           ` Max Waterman
  2006-01-14 18:14         ` Mark Hahn
  1 sibling, 1 reply; 22+ messages in thread
From: Michael Tokarev @ 2006-01-14 10:42 UTC (permalink / raw)
  To: Max Waterman; +Cc: Ross Vandegrift, linux-raid

Max Waterman wrote:
[]
> My preference will probably be raid10 - ie raid0 2 drives, raid0
> another 2 drives, and then raid1 both raid0s. My 5th disk can be a hot
> spare. Round reasonable?

Nononono. Never do that.  Instead, create two raid1s and raid0
both, ie, just the opposite.  Think about the two variants, and
I hope you'll come to the reason why raid0(2x raid1) is more
reliable than raid1(2x raid0). ;)

> Alternatively, we could probably get a 6th disk and do raid1 on
> disk #5 & #6 and install the OS on that - keeping the application
> data separate. This would be ideal, I think. For some reason, I like
> to keep os separate from application data.

BTW, there's a raid10 module in current 2.6 kernels, which works
somewhat differently compared with raid0(2x raid1) etc.

/mjt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14 10:42         ` Michael Tokarev
@ 2006-01-14 11:48           ` Max Waterman
  0 siblings, 0 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-14 11:48 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Ross Vandegrift, linux-raid

Michael Tokarev wrote:
> Max Waterman wrote:
> []
>> My preference will probably be raid10 - ie raid0 2 drives, raid0
>> another 2 drives, and then raid1 both raid0s. My 5th disk can be a hot
>> spare. Round reasonable?
> 
> Nononono. Never do that.  Instead, create two raid1s and raid0
> both, ie, just the opposite.  Think about the two variants, and
> I hope you'll come to the reason why raid0(2x raid1) is more
> reliable than raid1(2x raid0). ;)

Ah, yes. Right. I was aware of the 'difference', just had it backwards
in my mind...oops.

> 
>> Alternatively, we could probably get a 6th disk and do raid1 on
>> disk #5 & #6 and install the OS on that - keeping the application
>> data separate. This would be ideal, I think. For some reason, I like
>> to keep os separate from application data.
> 
> BTW, there's a raid10 module in current 2.6 kernels, which works
> somewhat differently compared with raid0(2x raid1) etc.

oh? and how does that compare to the MD. Although I can create md devices
using mdadm, I guess I'm not completely sure what actually does the work.

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  8:26       ` Max Waterman
  2006-01-14 10:42         ` Michael Tokarev
@ 2006-01-14 18:14         ` Mark Hahn
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Hahn @ 2006-01-14 18:14 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid

> My preference will probably be raid10 - ie raid0 2 drives, raid0
> another 2 drives, and then raid1 both raid0s. My 5th disk can be a hot
> spare. Round reasonable?

spares are pretty iffy if you are really worried about reliability,
since the disk just sits there for potentially years, then suddenly
it's recovered-onto, which is a lot of stress.  it's not pleasant 
if the spare fails during recovery (though not fatal, just that you're 
still in degraded/vulnerable.)  that's the reason that most people 
tend towards R6 rather than R5+HS.  admittedly, having a HS shared 
among multiple raids is more parsimonious, and R6 can be slow for write.

> Alternatively, we could probably get a 6th disk and do raid1 on
> disk #5 & #6 and install the OS on that - keeping the application
> data separate. This would be ideal, I think. For some reason, I like
> to keep os separate from application data.

fragmenting your storage has drawbacks as well, since in the normal 
course of things, some fragment becomes too small.

also bear in mind that higher-order raids (many disks, whether R0,
R10, R6, etc) tend to want you to do very large transfers.  if your 
traffic is small blocks, especially many interleaved stream of them,
you might not benefit much from parallelism.  (for instance, 64k is 
a pretty common raid stripe size, but if you have a 14-disk R6,
you'd really like to be doing writes in multiples of 768K!)

regards, mark hahn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-13 14:46 ` Ross Vandegrift
  2006-01-13 21:08   ` Lajber Zoltan
  2006-01-14  1:19   ` Max Waterman
@ 2006-01-14  1:22   ` Max Waterman
  2 siblings, 0 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-14  1:22 UTC (permalink / raw)
  To: linux-raid

Ross Vandegrift wrote:
> On Fri, Jan 13, 2006 at 03:06:54PM +0800, Max Waterman wrote:
>> One further strangeness is that our best results have been while using a 
>> uni-processor kernel - 2.6.8. We would prefer it if our best results 
>> were with the most recent kernel we have, which is 2.6.15, but no.
> 
> Sounds like this is probably a bug.  If you have some time to play
> around with it, I'd try kernels in between and find out exactly where
> the regression happened.  The bug will probably be cleaned up quickly
> and performance will be back where it should be.
> 
>> So, any advice on how to obtain best performance (mainly web and mail 
>> server stuff)?
>> Is 180MB/s-200MB/s a reasonable number for this h/w?
>> What numbers do other people see on their raid0 h/w?
>> Any other advice/comments?
> 
> My employer usues the 1850 more than the 2850, though we do have a few
> in production.  My feeling is that 180-200MB/sec is really excellent
> throughput.
> 
> We're comparing apples to oranges, but it'll at least give you an
> idea.  The Dell 1850s are sortof our highest class of machine that we
> commonly deploy.    We have a Supermicro chassis that's exactly like
> the 1850 but SATA instead of SCSI.  On the low-end, we have various P4
> Prescott chassis.
> 
> Just yesterday I was testing disk performance on a low-end box.  SATA
> on a 3Ware controller, RAID1.  I was quite pleased to be getting
> 70-80MB/sec.
> 
> So my feeling is that your numbers are fairly close to where they
> should be.  Faster procs, SCSI, and a better RAID card.  However, I'd
> also try RAID1 if you're mostly interested in read speed.  Remember
> that RAID1 lets you balance reads across disks, whereas RAID0 will
> require each disk in the array to retrieve the data.
> 

OK, this sounds good.

I still wonder where all the theoretical numbers went though.

The scsi channel should be able to handle 320MB/s, and we should have
enough disks to push that (each disk is 147-320MB/s and we have 4 of
them) - theoretically.

Why does the bandwidth seem to plateau with two disks - adding more into
the raid0 doesn't seem to improve performance at all?

Why do I get better numbers using the file for the while device (is
there a better name for it), rather than for a partition (ie /dev/sdb is
faster than /dev/sdb1 - by a lot)?

Can you explain why raid1 would be faster than raid0? I don't see why
that would be...

Things I have to try from your email so far are :

1) raid1 - s/w and h/w (we don't care much about capacity, so it's ok)
2) raid0 - h/w, with bonnie++ using no partition table
3) kernels in between 2.6.8 and 2.6.15

Max.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-13  7:06 md faster than h/w? Max Waterman
  2006-01-13 14:46 ` Ross Vandegrift
@ 2006-01-14  6:40 ` Mark Hahn
  2006-01-14  8:54   ` Max Waterman
                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Mark Hahn @ 2006-01-14  6:40 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid

> have 5 Fujitsu MAX3073NC drives attached to one of it's channels (can't 
...
> According to Fujitsu's web site, the disks can each do internal IO at 
> upto 147MB/s, and burst upto 320MB/s. According to the LSI Logic web 

the meaning of "internal transfer rate" is a bit fuzzy - 
let's assume it means the raw bandwidth coming off the head,
before encoding and ECC overhead.  I believe the real sustained
transfer rate (at the scsi connector) would be under 100 MB/s,
decreasing noticably on inner tracks.

note also that the MegaRAID SCSI 320-2 is just a 64x66 card,
meaning its peak theoretical bandwidth is 533 MB/s, and you 
probably should expect closer to 50% of that peak under the 
best circumstances.  have you examined your PCI topology, as well
as some of the tweakable settings like the latency timer?

the 2850 seems to be a pretty reasonable server, and should
certainly not be starving you for host memory bandwidth, for instance.

> So, we're trying to measure the performance. We've been using 'bonnie++' 
> and 'hdparm -t'.

they each have flaws.  I prefer to attempt to get more basic numbers
by ignoring filesystem issues entirely, ignoring seek rates, and 
measuring pure read/write streaming bandwidth.  I've written a fairly
simple bandwidth-reporting tool:
	http://www.sharcnet.ca/~hahn/iorate.c

it prints incremental bandwidth, which I find helpful because it shows
recording zones, like this slightly odd Samsung:
	http://www.sharcnet.ca/~hahn/sp0812c.png

> Initially, we were getting 'hdparm -t' numbers around 80MB/s, but this 
> was when we were testing /dev/sdb1 - the (only) partition on the device. 
> When we started testing /dev/sdb, it increased significantly to around 
> 180MB/s. I'm not sure what to conclude from this.

there are some funny interactions between partitions, filesystems 
and low-level parameters like readahead.

> Using theoretical numbers as a maximum, we should be able to read at the 
> greater of 4 times a single drive speed (=588MB/s) or the SCSI bus speed 
> (320MB/s) ie 320MB/s.

you should really measure the actual speed of one drive alone first.  
I'd guess it starts at ~90 MB/s and drops to 70 or so..

> Doing this initially resulted in a doubling of bonnie++ speed at over 
> 200MB/s, though I have been unable to reproduce this result - the most 
> common result is still about 180MB/s.

200 is pretty easy to achieve using MD raid0, and pretty impressive for 
hardware raid, at least traditionally.  there are millions and millions
of hardware raid solutions out there that wind up being disgustingly 
slow, with very little correlation to price, marketing features, etc.
you can pretty safely assume that older HW raid solutions suck, though:
the only ones I've seen perform well are new or fundamental redesigns 
happening in the last ~2 years.

I suspect you can actually tell a lot about the throughput of a HW 
raid solution just by looking at the card: estimate the local memory
bandwidth.  for instance, the Megaraid, like many HW raid cards,
takes a 100 MHz ECC sdram dimm, which means it has 2-300 MB/s to work
with.  compare this to a (new) 3ware 9550 card, which has ddr2/400,
(8x peak bandwidth, I believe - it actually has BGA memory chips on both
sides of the board like a GPU...)

> One further strangeness is that our best results have been while using a 
> uni-processor kernel - 2.6.8. We would prefer it if our best results 
> were with the most recent kernel we have, which is 2.6.15, but no.

hmm, good one.  I haven't scrutinized the changelogs in enough detail,
but I don't see a lot of major overhaul happening.  how much difference 
are you talking about?

> So, any advice on how to obtain best performance (mainly web and mail 
> server stuff)?

do you actually need large/streaming bandwidth?  best performance 
is when the file is in page cache already, which is why it sometimes 
makes sense to put lots of GB into this kind of machine...

> Is 180MB/s-200MB/s a reasonable number for this h/w?

somewhat, but it's not really a high-performance card.  it might be
instructive to try a single disk, then 2x raid0, then 3x, 4x.  I'm guessing
that you get most of that speed with just 2 or 3 disks, and that adding the 
fourth is hitting a bottleneck, probably on the card.

> What numbers do other people see on their raid0 h/w?

I'm about to test an 8x 3ware 9550 this weekend.  but 4x disks on a $US 60
promise tx2 will already beat your system ;)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  6:40 ` Mark Hahn
@ 2006-01-14  8:54   ` Max Waterman
  2006-01-14 21:23   ` Ross Vandegrift
  2006-01-16  6:31   ` Max Waterman
  2 siblings, 0 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-14  8:54 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Mark Hahn wrote:
>> have 5 Fujitsu MAX3073NC drives attached to one of it's channels (can't 
> ...
>> According to Fujitsu's web site, the disks can each do internal IO at 
>> upto 147MB/s, and burst upto 320MB/s. According to the LSI Logic web 
> 
> the meaning of "internal transfer rate" is a bit fuzzy - 
> let's assume it means the raw bandwidth coming off the head,
> before encoding and ECC overhead.  I believe the real sustained
> transfer rate (at the scsi connector) would be under 100 MB/s,
> decreasing noticably on inner tracks.

OK.

> 
> note also that the MegaRAID SCSI 320-2 is just a 64x66 card,
> meaning its peak theoretical bandwidth is 533 MB/s, and you 

right. I figured that was above the SCSI's bus limit anyway,
so it wasn't relevant; though, I suppose, SCSI might be more
'efficient' and so achieve closer to it's theoretical than PCI
<shrug>.

> probably should expect closer to 50% of that peak under the 
> best circumstances.  have you examined your PCI topology, as well
> as some of the tweakable settings like the latency timer?

I had a quick look at what else was on the PCI bus, but I didn't
linger on it...there's nothing significant in the system, though
there are two gigabit ethernet ports which can probably effect things
(we're using the second, which I guess is more likely to be on
the PCI bus than the first).

I have no experience of looking at PCI topology. Can you give
me some pointers on what I should be looking for?

The only systems I've looked at before are nForce3 250Gb,
where the network load is taken off the PCI bus by the nVidia MCP.

With this system, I think they're both on the PCI bus proper. In
any case, we've not been exercising the network, so I don't suppose
it consumes anything noticable. Also, they're only connected to
a 100Mbps/full switch, so they're not doing gigabit.

I think there's another SCSI adapter - built-in - but nothing attached to it.

> the 2850 seems to be a pretty reasonable server, and should
> certainly not be starving you for host memory bandwidth, for instance.

Good. It seems to be fairly well loaded. 4GB RAM, plus 2x3GB Xeons. Linux
seems to think it has 4 CPUs, so I suspect that's due to hyper-threading
or whatever it's called.

> 
>> So, we're trying to measure the performance. We've been using 'bonnie++' 
>> and 'hdparm -t'.
> 
> they each have flaws.  I prefer to attempt to get more basic numbers
> by ignoring filesystem issues entirely, ignoring seek rates, and 
> measuring pure read/write streaming bandwidth.  I've written a fairly
> simple bandwidth-reporting tool:
> 	http://www.sharcnet.ca/~hahn/iorate.c

Cool. I'll give that a shot on Monday :D

> 
> it prints incremental bandwidth, which I find helpful because it shows
> recording zones, like this slightly odd Samsung:
> 	http://www.sharcnet.ca/~hahn/sp0812c.png

Interesting :)

> 
>> Initially, we were getting 'hdparm -t' numbers around 80MB/s, but this 
>> was when we were testing /dev/sdb1 - the (only) partition on the device. 
>> When we started testing /dev/sdb, it increased significantly to around 
>> 180MB/s. I'm not sure what to conclude from this.
> 
> there are some funny interactions between partitions, filesystems 
> and low-level parameters like readahead.
> 
>> Using theoretical numbers as a maximum, we should be able to read at the 
>> greater of 4 times a single drive speed (=588MB/s) or the SCSI bus speed 
>> (320MB/s) ie 320MB/s.
> 
> you should really measure the actual speed of one drive alone first.  
> I'd guess it starts at ~90 MB/s and drops to 70 or so..

Yes, I've done that, and your numbers seem pretty typical to what I've been
measuring.

> 
>> Doing this initially resulted in a doubling of bonnie++ speed at over 
>> 200MB/s, though I have been unable to reproduce this result - the most 
>> common result is still about 180MB/s.
> 
> 200 is pretty easy to achieve using MD raid0, and pretty impressive for 
> hardware raid, at least traditionally.  there are millions and millions
> of hardware raid solutions out there that wind up being disgustingly 
> slow, with very little correlation to price, marketing features, etc.
> you can pretty safely assume that older HW raid solutions suck, though:
> the only ones I've seen perform well are new or fundamental redesigns 
> happening in the last ~2 years.

If 200 is easy to achieve with MD raid0, then I'd guess that I'm hitting
a bottleneck somewhere other than the disks. Perhaps it's the SCSI bus
bandwidth (since it's the lower than the PCI bus). In which case, trying
to use the second channel would help - IINM, we'd need some extra h/w
in order to do that with our SCSI backplane in the 2850 (though it's
far from clear). The SCSI controller has two channels, but the SCSI
backplane is all one - the extra h/w enables a 4+2 mode, I think,
which would be ideal, IMO.

> I suspect you can actually tell a lot about the throughput of a HW 
> raid solution just by looking at the card: estimate the local memory
> bandwidth.  for instance, the Megaraid, like many HW raid cards,
> takes a 100 MHz ECC sdram dimm, which means it has 2-300 MB/s to work
> with.  compare this to a (new) 3ware 9550 card, which has ddr2/400,
> (8x peak bandwidth, I believe - it actually has BGA memory chips on both
> sides of the board like a GPU...)

Hrm. It seems like a good indicator. I'll take a look at the DIMM to see
what speed it is (maybe we put in one that is too slow or something).

I did note that it only has 128MB - I think it can take 1GB, IIRC. I'm not
sure what difference the additional RAM will make - is it just cache, or
is it used for RAID calculations?

> 
>> One further strangeness is that our best results have been while using a 
>> uni-processor kernel - 2.6.8. We would prefer it if our best results 
>> were with the most recent kernel we have, which is 2.6.15, but no.
> 
> hmm, good one.  I haven't scrutinized the changelogs in enough detail,
> but I don't see a lot of major overhaul happening.  how much difference 
> are you talking about?

Something like 20MB/s slower...often less, but still, always less.

> 
>> So, any advice on how to obtain best performance (mainly web and mail 
>> server stuff)?
> 
> do you actually need large/streaming bandwidth?  best performance 
> is when the file is in page cache already, which is why it sometimes 
> makes sense to put lots of GB into this kind of machine...

I don't think we need large/streaming bandwidth; it's just a measure
we're using.

Indeed, more memory is good. We had 4GB, which seems like a lot to me,
though it can take more.

>> Is 180MB/s-200MB/s a reasonable number for this h/w?
> 
> somewhat, but it's not really a high-performance card.  it might be
> instructive to try a single disk, then 2x raid0, then 3x, 4x.  I'm guessing
> that you get most of that speed with just 2 or 3 disks, and that adding the 
> fourth is hitting a bottleneck, probably on the card.

Yes, I did that. Actually, it looked like adding the 3rd made little difference.

> 
>> What numbers do other people see on their raid0 h/w?
> 
> I'm about to test an 8x 3ware 9550 this weekend.  but 4x disks on a $US 60
> promise tx2 will already beat your system ;)

Ug :(

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  6:40 ` Mark Hahn
  2006-01-14  8:54   ` Max Waterman
@ 2006-01-14 21:23   ` Ross Vandegrift
  2006-01-16  4:37     ` Max Waterman
  2006-01-16  6:31   ` Max Waterman
  2 siblings, 1 reply; 22+ messages in thread
From: Ross Vandegrift @ 2006-01-14 21:23 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Max Waterman, linux-raid

On Sat, Jan 14, 2006 at 01:40:53AM -0500, Mark Hahn wrote:
> > Initially, we were getting 'hdparm -t' numbers around 80MB/s, but this 
> > was when we were testing /dev/sdb1 - the (only) partition on the device. 
> > When we started testing /dev/sdb, it increased significantly to around 
> > 180MB/s. I'm not sure what to conclude from this.
> 
> there are some funny interactions between partitions, filesystems 
> and low-level parameters like readahead.

Hmmm, I'm not convinced, though it could be that the disks in my
workstation are not fast enough.

I used hdparm and your iorate program to compare the performance on my
fastest disk (7200rpm, ATA100).  The difference between partition vs.
disk is definitely within the margin of error: 2-3MB/sec when I'm
averaging around 50MB/sec.

I'd be suspicious of much more difference between the two...

> I'm about to test an 8x 3ware 9550 this weekend.  but 4x disks on a $US 60
> promise tx2 will already beat your system ;)

No way.  Cause you're gonna max the PCI bus if you use that card with MD.
Say you're running on a 66MHz PCI bus, you'll max at 266MB/sec.  For
four disks, 266/4 = 66.5MB/sec, you're already slower than the
original poster's RAID.  Add network traffic to that equation, and
Promise is laughable.

The 3Ware will rock considerably, as it's real hardware, so you'll
only send one copy of the data.  On top of which, the PCIX will be
hard to fill up.  I've seen 32-bit, 33MHz 3Ware cards hold 80MB/sec
without breaking a sweat.

If there's a PCI-X or PCI Express version of the TX4, then just
pretend I didn't post this  ::-)

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14 21:23   ` Ross Vandegrift
@ 2006-01-16  4:37     ` Max Waterman
  2006-01-16  5:33       ` Max Waterman
  0 siblings, 1 reply; 22+ messages in thread
From: Max Waterman @ 2006-01-16  4:37 UTC (permalink / raw)
  To: Ross Vandegrift; +Cc: linux-raid

Ross Vandegrift wrote:
> On Sat, Jan 14, 2006 at 01:40:53AM -0500, Mark Hahn wrote:
>>> Initially, we were getting 'hdparm -t' numbers around 80MB/s, but this 
>>> was when we were testing /dev/sdb1 - the (only) partition on the device. 
>>> When we started testing /dev/sdb, it increased significantly to around 
>>> 180MB/s. I'm not sure what to conclude from this.
>> there are some funny interactions between partitions, filesystems 
>> and low-level parameters like readahead.
> 
> Hmmm, I'm not convinced, though it could be that the disks in my
> workstation are not fast enough.
> 
> I used hdparm and your iorate program to compare the performance on my
> fastest disk (7200rpm, ATA100).  The difference between partition vs.
> disk is definitely within the margin of error: 2-3MB/sec when I'm
> averaging around 50MB/sec.
> 
> I'd be suspicious of much more difference between the two...

In which case, I'm suspicious - using 'hdparm -t' on h/w RAID0 (4 disks) :

/dev/sdb:
  Timing buffered disk reads:  536 MB in  3.00 seconds = 178.57 MB/sec
/dev/sdb1:
  Timing buffered disk reads:  100 MB in  3.01 seconds =  33.19 MB/sec

That's a big difference in my book.

However, with bonnie++, using filesystems created on the above devices,
I get similar numbers :

/dev/sdb:

--Sequential Input--
-Per Chr- --Block--
K/sec %CP K/sec %CP
38586  76 126818  15

/dev/sdb1:

--Sequential Input-
-Per Chr- --Block--
K/sec %CP K/sec %CP
38185  76 127569  15

After running that, I reran hdparm, and it reported ~40MB for *both*
/dev/sdb and /dev/sdb1.
Then I unmount /dev/sdb and it's back up to 155MB/s !?!?!

It's not making any sense to me :(

Strange. I guess I should just ignore 'hdparm -t'?

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-16  4:37     ` Max Waterman
@ 2006-01-16  5:33       ` Max Waterman
  2006-01-16 14:12         ` Andargor
  0 siblings, 1 reply; 22+ messages in thread
From: Max Waterman @ 2006-01-16  5:33 UTC (permalink / raw)
  Cc: linux-raid

Max Waterman wrote:
>
> Then I unmount /dev/sdb and it's back up to 155MB/s !?!?!
> 
> It's not making any sense to me :(
> 

OK, the 'hdparm -t' numbers are always low for a device that is mounted.

i.e. an unmounted device gets 3-4 times the bandwidth of a mounted device.

Of course, bonnie++ only works on mounted devices, but gives me
reasonable (but not great) numbers (130MB/s) which don't seem to vary
too much with the kernel version.

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-16  5:33       ` Max Waterman
@ 2006-01-16 14:12         ` Andargor
  2006-01-17  9:18           ` Max Waterman
  0 siblings, 1 reply; 22+ messages in thread
From: Andargor @ 2006-01-16 14:12 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid

--- Max Waterman
<davidmaxwaterman+gmane@fastmail.co.uk> wrote:
> Of course, bonnie++ only works on mounted devices,
> but gives me
> reasonable (but not great) numbers (130MB/s) which
> don't seem to vary
> too much with the kernel version.

Out of curiosity, have you compared bonnie++ results
with and without -f (fast)?

I've found that it seems to report x2 read throughput
without -f. Perhaps it "warms up" the drive with
putc() and getc(), allowing the kernel and/or cache to
do its job?

I haven't found a benchmark that is 100%
reliable/comparable. Of course, it all depends how the
drive is used in production, which may have little
correlation with the benchmarks...

Andargor

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-16 14:12         ` Andargor
@ 2006-01-17  9:18           ` Max Waterman
  2006-01-17 17:09             ` Andargor
  0 siblings, 1 reply; 22+ messages in thread
From: Max Waterman @ 2006-01-17  9:18 UTC (permalink / raw)
  To: linux-raid

Andargor wrote:
> 
> --- Max Waterman
> <davidmaxwaterman+gmane@fastmail.co.uk> wrote:
> 
>>Of course, bonnie++ only works on mounted devices,
>>but gives me
>>reasonable (but not great) numbers (130MB/s) which
>>don't seem to vary
>>too much with the kernel version.
> 
> 
> Out of curiosity, have you compared bonnie++ results
> with and without -f (fast)?

Nope. Just used default options (as well as the '-u' and
'-d' options, of course).

> 
> I've found that it seems to report x2 read throughput
> without -f. Perhaps it "warms up" the drive with
> putc() and getc(), allowing the kernel and/or cache to
> do its job?

Hrm.

> 
> I haven't found a benchmark that is 100%
> reliable/comparable. Of course, it all depends how the
> drive is used in production, which may have little
> correlation with the benchmarks...

Indeed.

Do you think that if it is configured for the best possible
read performance, then that would be it's worst possible
write performance?

I was hoping that having it configured for good read perf.
would mean it was pretty good for write too....

Max.

> 
> Andargor
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-17  9:18           ` Max Waterman
@ 2006-01-17 17:09             ` Andargor
  2006-01-18  4:43               ` Max Waterman
  0 siblings, 1 reply; 22+ messages in thread
From: Andargor @ 2006-01-17 17:09 UTC (permalink / raw)
  To: Max Waterman, linux-raid

--- Max Waterman
<davidmaxwaterman+gmane@fastmail.co.uk> wrote:

> Andargor wrote:
> > 
> > I haven't found a benchmark that is 100%
> > reliable/comparable. Of course, it all depends how
> the
> > drive is used in production, which may have little
> > correlation with the benchmarks...
> 
> Indeed.
> 
> Do you think that if it is configured for the best
> possible
> read performance, then that would be it's worst
> possible
> write performance?
> 
> I was hoping that having it configured for good read
> perf.
> would mean it was pretty good for write too....
> 
> Max.
> 

I don't have nearly the expertise some people here
show, but intuitively I don't think that's true. If
anything, it would be the opposite, unless write
caching was as good as read caching (both h/w and
kernel). Also, the number of disks you have to write
to or read from depending on RAID level has an impact.

And as Mark Hahn has indicated, the actual location on
disk you are reading/writing has an impact as well.
Difficult to evaluate objectively.

So, basically, I don't have an answer to that. :)

Andargor

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-17 17:09             ` Andargor
@ 2006-01-18  4:43               ` Max Waterman
  0 siblings, 0 replies; 22+ messages in thread
From: Max Waterman @ 2006-01-18  4:43 UTC (permalink / raw)
  To: linux-raid

Andargor wrote:
> 
> --- Max Waterman
> <davidmaxwaterman+gmane@fastmail.co.uk> wrote:
> 
>> Andargor wrote:
>>> I haven't found a benchmark that is 100%
>>> reliable/comparable. Of course, it all depends how
>> the
>>> drive is used in production, which may have little
>>> correlation with the benchmarks...
>> Indeed.
>>
>> Do you think that if it is configured for the best
>> possible
>> read performance, then that would be it's worst
>> possible
>> write performance?
>>
>> I was hoping that having it configured for good read
>> perf.
>> would mean it was pretty good for write too....
>>
>> Max.
>>
> 
> I don't have nearly the expertise some people here
> show, but intuitively I don't think that's true. If
> anything, it would be the opposite, unless write
> caching was as good as read caching (both h/w and
> kernel).

Ok. I wonder if it's possible to have the best possible read 
performance, and the worst possible write performance at the same time?

I'm noticing these messages :

"sda: asking for cache data failed
sda: assuming drive cache: write through"

in the dmesg output. We've set the raid drive to be write-back for 
better bandwidth, but if sd is assuming write through, I wonder what 
impact that will have on write performance? ... but I've asked that in a 
separate message already.

> Also, the number of disks you have to write
> to or read from depending on RAID level has an impact.

I'm assuming more is better? We're trying to get an extra one to make it 
up to 6.

What RAID should we use for best write bandwidth?

I'm assuming RAID5 isn't the best...doesn't it have to touch every disk 
for a write - ie no benefit over a single disk?

> And as Mark Hahn has indicated, the actual location on
> disk you are reading/writing has an impact as well.
> Difficult to evaluate objectively.

Yes, but I don't see that I have much control over that in the end 
system...or do I? I suppose I could partition for performance - sounds 
messy.

Max.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-14  6:40 ` Mark Hahn
  2006-01-14  8:54   ` Max Waterman
  2006-01-14 21:23   ` Ross Vandegrift
@ 2006-01-16  6:31   ` Max Waterman
  2006-01-16 13:30     ` Ric Wheeler
  2 siblings, 1 reply; 22+ messages in thread
From: Max Waterman @ 2006-01-16  6:31 UTC (permalink / raw)
  To: linux-raid

Mark Hahn wrote:
> I've written a fairly
> simple bandwidth-reporting tool:
> 	http://www.sharcnet.ca/~hahn/iorate.c
> 
> it prints incremental bandwidth, which I find helpful because it shows
> recording zones, like this slightly odd Samsung:
> 	http://www.sharcnet.ca/~hahn/sp0812c.png
> 

Using iorate.c, I guess somewhat different numbers for the 2.6.15 kernel than
for the 2.6.8 kernel - the 2.6.15 kernel starts off at 105MB/s and head down
to 94MB/s, while 2.6.8 starts at 140MB/s and heads town to 128MB/s.

That seems like a significant difference to me?

What to do?

Max.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-16  6:31   ` Max Waterman
@ 2006-01-16 13:30     ` Ric Wheeler
  2006-01-16 14:08       ` Mark Hahn
  0 siblings, 1 reply; 22+ messages in thread
From: Ric Wheeler @ 2006-01-16 13:30 UTC (permalink / raw)
  To: Max Waterman; +Cc: linux-raid, Butler, Tim

Max Waterman wrote:

> Mark Hahn wrote:
>
>> I've written a fairly
>> simple bandwidth-reporting tool:
>>     http://www.sharcnet.ca/~hahn/iorate.c
>>
>> it prints incremental bandwidth, which I find helpful because it shows
>> recording zones, like this slightly odd Samsung:
>>     http://www.sharcnet.ca/~hahn/sp0812c.png
>>
>
> Using iorate.c, I guess somewhat different numbers for the 2.6.15 
> kernel than
> for the 2.6.8 kernel - the 2.6.15 kernel starts off at 105MB/s and 
> head down
> to 94MB/s, while 2.6.8 starts at 140MB/s and heads town to 128MB/s.
>
> That seems like a significant difference to me?
>
> What to do?
>
> Max.
>
Keep in mind that disk performance is very dependent on exactly what 
your IO pattern looks like and which part of the disk you are reading.

For example, you should be able to consistently max out the bus if you 
write a relatively small (say 8MB) block of data to a disk and then 
(avoiding the buffer cache) do direct IO reads to read it back.  This 
test is useful for figuring out if we have introduced any IO performance 
bumps as all of the data read should come directly from the disk cache 
and not require any head movement, platter reads, etc.  You can repeat 
this test for each of the independent drives in your system.

It is also important to keep in mind that different parts of your disk 
platter have different maximum throughput rates.  For example, reading 
from the outer sectors on a platter will give you a significantly 
different profile than reading from the inner sectors on a platter.

We have some tests that we use to measure raw disk performance that try 
to get through these hurdles to measure performance in a consistent and 
reproducible way...

ric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: md faster than h/w?
  2006-01-16 13:30     ` Ric Wheeler
@ 2006-01-16 14:08       ` Mark Hahn
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Hahn @ 2006-01-16 14:08 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Max Waterman, linux-raid, Butler, Tim

> > Using iorate.c, I guess somewhat different numbers for the 2.6.15 
> > kernel than
> > for the 2.6.8 kernel - the 2.6.15 kernel starts off at 105MB/s and 
> > head down
> > to 94MB/s, while 2.6.8 starts at 140MB/s and heads town to 128MB/s.
> >
> > That seems like a significant difference to me?

yes that's surprising.  I should have mentioned that the way I normally
use iorate output is to plot the incremental bandwidth as a function of 
position (disk offset).  that way I can clearly see contributions of 
kernel page-cache, possible flattening due to a bottleneck, and the normal
zoned-recording curve.

#!/usr/bin/perl
use strict;

for my $fname (@ARGV) {
    open(I,"<$fname");
    open(INC,">$fname.inc");
    open(AVG,">$fname.avg");
    while (<I>) {
        my @fields = split;
        if ($#fields == 3 && /[0-9]$/) {
            print INC "$fields[1] $fields[2]\n";
            print AVG "$fields[1] $fields[3]\n";
        }
    }
    close(AVG);
    close(INC);
}

I sometimes plot the running average curve as well, since it shows how much 
less informative the average (ala bonnie/iozone/etc) is.

> Keep in mind that disk performance is very dependent on exactly what 
> your IO pattern looks like and which part of the disk you are reading.

that's the main point of using iorate.

> We have some tests that we use to measure raw disk performance that try 
> to get through these hurdles to measure performance in a consistent and 
> reproducible way...

iorate profiles are reasonably consistent, as well.  it doesn't attempt to 
do any IO pattern except streaming reads or writes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-01-18  4:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-13  7:06 md faster than h/w? Max Waterman
2006-01-13 14:46 ` Ross Vandegrift
2006-01-13 21:08   ` Lajber Zoltan
2006-01-14  1:19   ` Max Waterman
2006-01-14  2:05     ` Ross Vandegrift
2006-01-14  8:26       ` Max Waterman
2006-01-14 10:42         ` Michael Tokarev
2006-01-14 11:48           ` Max Waterman
2006-01-14 18:14         ` Mark Hahn
2006-01-14  1:22   ` Max Waterman
2006-01-14  6:40 ` Mark Hahn
2006-01-14  8:54   ` Max Waterman
2006-01-14 21:23   ` Ross Vandegrift
2006-01-16  4:37     ` Max Waterman
2006-01-16  5:33       ` Max Waterman
2006-01-16 14:12         ` Andargor
2006-01-17  9:18           ` Max Waterman
2006-01-17 17:09             ` Andargor
2006-01-18  4:43               ` Max Waterman
2006-01-16  6:31   ` Max Waterman
2006-01-16 13:30     ` Ric Wheeler
2006-01-16 14:08       ` Mark Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).