linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid10 vs raid5 - strange performance
@ 2008-03-25 21:33 Christian Pernegger
  2008-03-25 22:13 ` David Rees
  2008-03-25 23:36 ` Keld Jørn Simonsen
  0 siblings, 2 replies; 19+ messages in thread
From: Christian Pernegger @ 2008-03-25 21:33 UTC (permalink / raw)
  To: Linux RAID

I did a few tests of raid10 on an old 3ware 7506-8 today (using 6
drives), both with the built-in raid 10 and md's n2 and chunk sizes
from 64KB to 1024KB.
- caches dropped before each test
- averages of three runs
- arrays in synced state
- dd tests: 6GB (3MB bs)


dd-reads:
single drive: 64 MB/s
3w: 104-114 MB/s
md: 113-115 MB/s

bonnie++-reads:
single drive: 36 MB/s
3w: 94-97 MB/s
md: 101-108 MB/s

The card unfortunately is in a regular PCI slot, not bad considering.

dd-writes:
single drive: 67 MB/s
3w: 42 MB/s
md: 42 MB/s

bonnie++-writes:
single drive: 36 MB/s
3w:  40-41 MB/s
md:  41 MB/s

3w and md speed are pretty much identical and just barely over that of
a single drive. If it's because md has to send the same data to
multiple disks 3w should have an advantage but it does not.

So I ran the same tests using a md-raid5 over the same disks just for
kicks (512KB chunks, no bitmap):

dd-reads: 115 MB/s
bonnie++-reads: 87  MB/s
dd-writes: 69 MB/s
bonnie++-writes: 62  MB/s

Writes are actually a lot better than any raid10 ... despite all the
hype it gets on the list. I wanted to go with raid10 for this box
because it's not mostly-read for a change.

Explanations welcome.

Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-25 21:33 raid10 vs raid5 - strange performance Christian Pernegger
@ 2008-03-25 22:13 ` David Rees
  2008-03-25 22:56   ` Christian Pernegger
  2008-03-25 23:36 ` Keld Jørn Simonsen
  1 sibling, 1 reply; 19+ messages in thread
From: David Rees @ 2008-03-25 22:13 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Linux RAID

On Tue, Mar 25, 2008 at 2:33 PM, Christian Pernegger
<pernegger@gmail.com> wrote:
>  So I ran the same tests using a md-raid5 over the same disks just for
>  kicks (512KB chunks, no bitmap):
>
>  dd-reads: 115 MB/s
>  bonnie++-reads: 87  MB/s
>  dd-writes: 69 MB/s
>  bonnie++-writes: 62  MB/s
>
>  Writes are actually a lot better than any raid10 ... despite all the
>  hype it gets on the list. I wanted to go with raid10 for this box
>  because it's not mostly-read for a change.

Not surprising at all. Read performance is similar between the two
setups as expected (appears to be limited by the PCI bus).

Streaming write performance is better because you are writing less
redundant data to disks, you can now stripe writes over 5 disks
instead of 3. So your performance increase there also looks
appropriate.

Your benchmarks are simulating the best case scenario for RAID5.

-Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-25 22:13 ` David Rees
@ 2008-03-25 22:56   ` Christian Pernegger
  2008-03-26 16:29     ` Bill Davidsen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pernegger @ 2008-03-25 22:56 UTC (permalink / raw)
  To: David Rees; +Cc: Linux RAID

>  Not surprising at all. Read performance is similar between the two
>  setups as expected (appears to be limited by the PCI bus).

Yes, reads are fine.

>  Streaming write performance is better because you are writing less
>  redundant data to disks, you can now stripe writes over 5 disks
>  instead of 3.

Sounds reasonable. Write performance SHOULD be ~5x single disk for
raid5 and ~3x single disk for raid10 in a theoretical best-case
scenario, either should hit the PCI bus cap. In reality the ratios are
more like 1x for raid5 and 0.6 -1.1x for raid10. It's just that I'd
expected ~ identical and significantly better streaming write
performance from both array configurations ... then again, if you
think it over,

raid5 will transfer 1 parity block per 5 data blocks (so 5/6 of the
PCI bw are usable = 92MB)
raid10 will transfer 3 copy blocks per 3 data blocks (so 1/2 of the
PCI bw are usable = 55MB)

Factoring in some contention / overhead my values might well be
normal. It just means that the fabled raid10 only performs if you have
high-bw buses, which this box sadly doesn't, or a hw controller where
redundant blocks don't go over the bus, which the 3ware 7506
apparently isn't.

I'll still go with raid10 for the 50% better random I/O, only less
enthusiastically.


Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-25 21:33 raid10 vs raid5 - strange performance Christian Pernegger
  2008-03-25 22:13 ` David Rees
@ 2008-03-25 23:36 ` Keld Jørn Simonsen
       [not found]   ` <bb145bd20803251837x7ef1fa9dk6191fcaea7b02144@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-25 23:36 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Linux RAID

On Tue, Mar 25, 2008 at 10:33:06PM +0100, Christian Pernegger wrote:
> I did a few tests of raid10 on an old 3ware 7506-8 today (using 6
> drives), both with the built-in raid 10 and md's n2 and chunk sizes
> from 64KB to 1024KB.
> - caches dropped before each test
> - averages of three runs
> - arrays in synced state
> - dd tests: 6GB (3MB bs)

What are the specifics for the raid10? Which layout? How many copies?

Best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-25 22:56   ` Christian Pernegger
@ 2008-03-26 16:29     ` Bill Davidsen
  2008-03-26 17:10       ` Christian Pernegger
  0 siblings, 1 reply; 19+ messages in thread
From: Bill Davidsen @ 2008-03-26 16:29 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: David Rees, Linux RAID

Christian Pernegger wrote:
>>  Not surprising at all. Read performance is similar between the two
>>  setups as expected (appears to be limited by the PCI bus).
>>     
>
> Yes, reads are fine.
>
>   
>>  Streaming write performance is better because you are writing less
>>  redundant data to disks, you can now stripe writes over 5 disks
>>  instead of 3.
>>     
>
> Sounds reasonable. Write performance SHOULD be ~5x single disk for
> raid5 and ~3x single disk for raid10 in a theoretical best-case
> scenario, either should hit the PCI bus cap. In reality the ratios are
> more like 1x for raid5 and 0.6 -1.1x for raid10. It's just that I'd
> expected ~ identical and significantly better streaming write
> performance from both array configurations ... then again, if you
> think it over,
>
> raid5 will transfer 1 parity block per 5 data blocks (so 5/6 of the
> PCI bw are usable = 92MB)
> raid10 will transfer 3 copy blocks per 3 data blocks (so 1/2 of the
> PCI bw are usable = 55MB)
>
> Factoring in some contention / overhead my values might well be
> normal. It just means that the fabled raid10 only performs if you have
> high-bw buses, which this box sadly doesn't, or a hw controller where
> redundant blocks don't go over the bus, which the 3ware 7506
> apparently isn't.
>
> I'll still go with raid10 for the 50% better random I/O, only less
> enthusiastically.
>   

I think that you should treat 10,n2 and 10,f2 as separate 
configurations, and test them as such. behavior is quite different, and 
one or the other might be a better fit to your usage.

The ability to transfer a single copy of the data and no parity 
information is an advantage of hardware controllers.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-26 16:29     ` Bill Davidsen
@ 2008-03-26 17:10       ` Christian Pernegger
  2008-03-26 17:49         ` Keld Jørn Simonsen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pernegger @ 2008-03-26 17:10 UTC (permalink / raw)
  To: linux-raid

>  I think that you should treat 10,n2 and 10,f2 as separate
>  configurations,

Certainly, but f2 write performance is supposed to be even worse than
n2 even in theory. Maybe I'll try it anyway.

>  The ability to transfer a single copy of the data and no parity
>  information is an advantage of hardware controllers.

The 3ware 7506-8 *is* a hardware controller and it quite obviously
does *not* have this ability, at least not for raid10. One can just as
well use the 3ware as a plain 8port controller and md over that,
doesn't make much of a difference. I find that interesting.

Cheers,

C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
       [not found]     ` <20080326072416.GA8674@rap.rap.dk>
@ 2008-03-26 17:16       ` Christian Pernegger
  2008-03-26 19:29         ` Keld Jørn Simonsen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pernegger @ 2008-03-26 17:16 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Linux RAID

>  OK, I think the normal hype on raid10 is to use the far layout for its
>  better striping performance.

Wasn't f2 supposed to improve *reads*? Maybe I've got it backwards.

>  Are the disks SATA disks or SCSI disks?

PATA. It's a how-to-best-utilize-some-older-stuff project.

>  you could probably get some more performance by moving 2 disks
>  to the PATA controller of the motherboard,

The board is newish and has just one PATA channel and that's
explicitly for CDROM use.

>  I think it is interesting that your 3ware controller is not maxing out
>  the PCI bus! With 6 disks of nominally 60 MB/s the  theoretical max is
>  360 MB/s. So the 3ware controller really has IO bandwidth available
>  which it is not getting close on putting to use.

Yes, that's the juicy part :)

>  What is the specifics for the 3ware controller?

http://www.3ware.com/products/parallel_ata.asp

It's the 8port version. Maybe their newer controllers are better but
this one frankly sucks at anything but raid1 and maybe raid0 ...

Cheers,

C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-26 17:10       ` Christian Pernegger
@ 2008-03-26 17:49         ` Keld Jørn Simonsen
  0 siblings, 0 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-26 17:49 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-raid

On Wed, Mar 26, 2008 at 06:10:17PM +0100, Christian Pernegger wrote:
> >  I think that you should treat 10,n2 and 10,f2 as separate
> >  configurations,
> 
> Certainly, but f2 write performance is supposed to be even worse than
> n2 even in theory. Maybe I'll try it anyway.

Nah, raid10,n2 and raid10,f2 has about equal writing performance,
both for sequential and random writes, and according to my test,
f2 is actually a little better than n2.

> >  The ability to transfer a single copy of the data and no parity
> >  information is an advantage of hardware controllers.
> 
> The 3ware 7506-8 *is* a hardware controller and it quite obviously
> does *not* have this ability, at least not for raid10. One can just as
> well use the 3ware as a plain 8port controller and md over that,
> doesn't make much of a difference. I find that interesting.

I do not believe the 3ware controller has raid10 in the MD sense.
The raid10 that 3ware may have is most likely a raid1+0.

Best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-26 17:16       ` Christian Pernegger
@ 2008-03-26 19:29         ` Keld Jørn Simonsen
  2008-03-27  1:11           ` Christian Pernegger
       [not found]           ` <47EAACAB.7070203@tmr.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-26 19:29 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Linux RAID

On Wed, Mar 26, 2008 at 06:16:30PM +0100, Christian Pernegger wrote:
> >  OK, I think the normal hype on raid10 is to use the far layout for its
> >  better striping performance.
> 
> Wasn't f2 supposed to improve *reads*? Maybe I've got it backwards.
> 
> >  Are the disks SATA disks or SCSI disks?
> 
> PATA. It's a how-to-best-utilize-some-older-stuff project.
> 
> >  you could probably get some more performance by moving 2 disks
> >  to the PATA controller of the motherboard,
> 
> The board is newish and has just one PATA channel and that's
> explicitly for CDROM use.
> 
> >  I think it is interesting that your 3ware controller is not maxing out
> >  the PCI bus! With 6 disks of nominally 60 MB/s the  theoretical max is
> >  360 MB/s. So the 3ware controller really has IO bandwidth available
> >  which it is not getting close on putting to use.
> 
> Yes, that's the juicy part :)
> 
> >  What is the specifics for the 3ware controller?
> 
> http://www.3ware.com/products/parallel_ata.asp
> 
> It's the 8port version. Maybe their newer controllers are better but
> this one frankly sucks at anything but raid1 and maybe raid0 ...

I don't know if it is any worse than other hardware controllers.
I have two on-board sata controller chips with each their version of HW
RAID (NVIDIA and SIL) but I hesitate to try them out as I do not believe
I can configure raids on the partition level (only on disk level, I
believe). 

I think there is a tendency that SW raid like Linux kernel MD and Sun ZFS
etc are more intelligent and can thus obtain better performance than HW
RAID. HW RAID has the advantage as Bill says that you only need to
transfer the data once when writing, and that parity calculation is
offloaded from the main CPUs. 

If you reuse older hw, then I would say that you really have a problem
with the PCI bus here, and that a motherboard with better bus
performance could really boost the overall performance of your older HW. 
You should be able to get something like a 3-fold IO improvement if you
could avoid the PCI botleneck.

best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-26 19:29         ` Keld Jørn Simonsen
@ 2008-03-27  1:11           ` Christian Pernegger
  2008-03-27  9:18             ` Keld Jørn Simonsen
       [not found]           ` <47EAACAB.7070203@tmr.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Christian Pernegger @ 2008-03-27  1:11 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Linux RAID

>  I have two on-board sata controller chips with each their version of HW
>  RAID (NVIDIA and SIL)

I don't know of any nVIdia or Silicon Image onboard chips that can do
raid in HW. Most likely it's BIOS-assisted software RAID. The 3ware
should be the real article.

>  I think there is a tendency that SW raid like Linux kernel MD and Sun ZFS
>  etc are more intelligent and can thus obtain better performance than HW
>  RAID.

Performance is one thing, flexibility is another ...
- end-to-end checksumming
- ECC-on-read and / or bg scrubbing
- arbitrary percentage of parity data (fs-level PAR2)
- virtualized storage pools with actual backing store chosen based on
usage hints
- all configurable rule-based with file level granularity

Sort of like a stable ZFS+raif variant ... one can only dream :-)

>  HW RAID has the advantage as Bill says that you only need to
>  transfer the data once when writing, and that parity calculation is
>  offloaded from the main CPUs.

Yes, maybe current cards conserve bus bandwidth even for RAID 10. Mine
doesn't, and there's no parity to offload. I don't like HW controllers
much since the on-disk format is closed ... if the controller dies you
have to get a very similar one, which can be a pain.

>  If you reuse older hw, then I would say that you really have a problem
>  with the PCI bus here, and that a motherboard with better bus
>  performance

The board has 5 bus segments, only most are PCIe. Boards with decent
PCI slots are getting hard to find ...

Nah, I'll just live with it until the disks start croaking ^^

Thanks,

C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
       [not found]           ` <47EAACAB.7070203@tmr.com>
@ 2008-03-27  2:02             ` Christian Pernegger
  2008-03-29 20:25               ` Bill Davidsen
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pernegger @ 2008-03-27  2:02 UTC (permalink / raw)
  To: Linux RAID

>  After doing a little research, I see that the original slowest form of PCI
>  was 32 bit 33MHz, with a bandwidth of ~127MB/s.

That's still the prevalent form, for anything else you need an (older)
server or workstation board. 133MB/s in theory, 80-100MB/s in
practice.

>  The most common hardware used the v2.1 spec, which was 64 bit at 66MHz.

I don't think the spec version has anything to do with speed ratings, really.

>  I would expect operation at UDMA/66

What's UDMA66 got to do with anything?

>  Final thought, these drives are paired on the master slave of the same
>  cable, are they? That will cause them to really perform badly.

The cables are master-only, I'm pretty sure the controller doesn't
even do slaves.


To wrap it up
- on a regular 32bit/33Mhz PCI bus md-RAID10 is hurt really badly by
having to transfer data twice in every case.
- the old 3ware 7506-8 doesn't accelerate RAID-10 in any way, even
though it's a hardware RAID controller, possibly because it's more of
an afterthought.


On the 1+0 vs RAID10 debate ... 1+0 = 10 is usually used to mean a
stripe of mirrors, while 0+1 = 01 is a less optimal mirror of stripes.
The md implementation doesn't really do a stacked raid but with n2
layout the data distribution should be identical to 1+0 / 10.

C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-27  1:11           ` Christian Pernegger
@ 2008-03-27  9:18             ` Keld Jørn Simonsen
  0 siblings, 0 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-27  9:18 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Linux RAID

On Thu, Mar 27, 2008 at 02:11:43AM +0100, Christian Pernegger wrote:
> >  I have two on-board sata controller chips with each their version of HW
> >  RAID (NVIDIA and SIL)
> 
> I don't know of any nVIdia or Silicon Image onboard chips that can do
> raid in HW. Most likely it's BIOS-assisted software RAID. The 3ware
> should be the real article.

I am not sure of the difference. My understanding is that in both for
the NVIDIA and the SIL, and also for the 3ware, all logic is done in the
chip/controller.

I even think that the onboard chips have their own memory? How would it
else be possible for them to do RAID5 and RAID6 parity calculation?

> >  I think there is a tendency that SW raid like Linux kernel MD and Sun ZFS
> >  etc are more intelligent and can thus obtain better performance than HW
> >  RAID.
> 
> Performance is one thing, flexibility is another ...
> - end-to-end checksumming
> - ECC-on-read and / or bg scrubbing
> - arbitrary percentage of parity data (fs-level PAR2)
> - virtualized storage pools with actual backing store chosen based on
> usage hints
> - all configurable rule-based with file level granularity

Even for performance the SW raid tends to be more intelligent.
Viz your example that a 3are controller maxes out on its logic, 
that it is not the PCI bus bandwidth that limits the IO.
And 3ware is one of the most known HW raid vendors, and thus one of the
vendors that presumably have the greates knowledge on HW raids, and 
accordinly can produce the most intelligent controllers.

SW raid can do something like raid10,f2 which has much better optimized
IO than traditional RAID1+0, and also RAID5 has more intelligent 
writing algorithms than I would imagine that most HW controllers have,
as MD raid5 has a different strategy for writing sequential and writing
randomly. There are probably more examples of cleverer IO performance
tricks in Linux raid. 

The buttom line is that there are more people involved in making the
Linux SW raid, than there are for each of the closed drivers that HW
vendors make, and that the cumulated knowledge of the open source
community will produce better software because of much better personal
resources.

best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-27  2:02             ` Christian Pernegger
@ 2008-03-29 20:25               ` Bill Davidsen
  2008-03-29 21:26                 ` Iustin Pop
  2008-03-30  8:55                 ` Keld Jørn Simonsen
  0 siblings, 2 replies; 19+ messages in thread
From: Bill Davidsen @ 2008-03-29 20:25 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Linux RAID

Christian Pernegger wrote:
>>  After doing a little research, I see that the original slowest form of PCI
>>  was 32 bit 33MHz, with a bandwidth of ~127MB/s.
>>     
>
> That's still the prevalent form, for anything else you need an (older)
> server or workstation board. 133MB/s in theory, 80-100MB/s in
> practice.
>
>   
I just looked at a few old machine running here, ASUS P4P800 (P4-2.8 
w/HT), P5GD1 (E6600), and A7V8X-X (Duron) boards, all 2-5 years old, and 
lspci shows 66MHz devices on the bus of all of then, and the two Intel 
ones have 64-bit devices attached.
>>  The most common hardware used the v2.1 spec, which was 64 bit at 66MHz.
>>     
>
> I don't think the spec version has anything to do with speed ratings, really.
>
>   
That was the version which included 66MHz and 64-bit. I believe any 
board using 184, 200, or 240 (from memory) RAM is v2.1, and probably 
runs a fast bus. Pretty much anything not using PC-100 memory. See 
wikipedia or similar about the versions.
>>  I would expect operation at UDMA/66
>>     
>
> What's UDMA66 got to do with anything?
>
>   
Sorry, you said this was a PATA system, I was speculating that your 
drives were UDMA/66 or faster. Otherwise the disk may be the issue, not 
the bus. Note: may... be.
>>  Final thought, these drives are paired on the master slave of the same
>>  cable, are they? That will cause them to really perform badly.
>>     
>
> The cables are master-only, I'm pretty sure the controller doesn't
> even do slaves.
>
>   
Good, drop one possible issue.
> To wrap it up
> - on a regular 32bit/33Mhz PCI bus md-RAID10 is hurt really badly by
> having to transfer data twice in every case.
> - the old 3ware 7506-8 doesn't accelerate RAID-10 in any way, even
> though it's a hardware RAID controller, possibly because it's more of
> an afterthought.
>
>
> On the 1+0 vs RAID10 debate ... 1+0 = 10 is usually used to mean a
> stripe of mirrors, while 0+1 = 01 is a less optimal mirror of stripes.
> The md implementation doesn't really do a stacked raid but with n2
> layout the data distribution should be identical to 1+0 / 10.
>   

The md raid10,f2 generally has modest write performance, if U is a 
single drive speed, write might range between 1.5U to (N-1)/2*U 
depending on tuning. Read speed is almost always (N-1)*U, which is great 
for many applications. Playing with chunk size, chunk buffers, etc, can 
make a large difference in write performance.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-29 20:25               ` Bill Davidsen
@ 2008-03-29 21:26                 ` Iustin Pop
  2008-03-30  8:55                 ` Keld Jørn Simonsen
  1 sibling, 0 replies; 19+ messages in thread
From: Iustin Pop @ 2008-03-29 21:26 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Christian Pernegger, Linux RAID

On Sat, Mar 29, 2008 at 04:25:31PM -0400, Bill Davidsen wrote:
> Christian Pernegger wrote:
>>>  After doing a little research, I see that the original slowest form of PCI
>>>  was 32 bit 33MHz, with a bandwidth of ~127MB/s.
>>>     
>>
>> That's still the prevalent form, for anything else you need an (older)
>> server or workstation board. 133MB/s in theory, 80-100MB/s in
>> practice.
>>
>>   
> I just looked at a few old machine running here, ASUS P4P800 (P4-2.8  
> w/HT), P5GD1 (E6600), and A7V8X-X (Duron) boards, all 2-5 years old, and  
> lspci shows 66MHz devices on the bus of all of then, and the two Intel  
> ones have 64-bit devices attached.
>>>  The most common hardware used the v2.1 spec, which was 64 bit at 66MHz.
>>>     
>>
>> I don't think the spec version has anything to do with speed ratings, really.
>>
>>   
> That was the version which included 66MHz and 64-bit. I believe any
> board using 184, 200, or 240 (from memory) RAM is v2.1, and probably
> runs a fast bus. Pretty much anything not using PC-100 memory. See
> wikipedia or similar about the versions.

I think Christian is right - while PCI 2.1 allows faster buses, it does
not require them. AFAIK, no consumer boards ever included faster PCI
busses. The only version of PCI that was indeed faster was present on
server/workstation boards, but that is a different type of slot.

In order to run at 64bit, the physical slot must be twice the length (as
it's a parallel bus), which can be seen easily seen on wikipedia:
http://en.wikipedia.org/wiki/Image:Pci-slots.jpg is the standard 32-bit
pci slot, whereas http://en.wikipedia.org/wiki/Image:64bitpci.jpg is the
64 bit one - note the double length and the different keying.

The difference between 33MHz and 66MHz is that, while the standard
allows both, it requires a different keying of the slot. 66MHz is only
available on 3.3V slots, which are new in version 2.1, and I think no
vendor risked introducing a different slot just to provide more speed
and (possibly) sacrifice compatibility. Wikipedia says:

    * Card
          o 32-bit, 33 MHz (added in Rev. 2.0)
          o 64-bit, 33 MHz (added in Rev. 2.0)
          o 32-bit, 66 MHz (3.3 V only, added in Rev. 2.1)
          o 64-bit, 66 MHz (3.3 V only, added in Rev. 2.1)
    * Slot
          o 32-bit, 5 V (most common on desktop mainboard)
          o 32-bit, 3.3 V (rare)
          o 64-bit, 5 V (less common, but can also be found on earlier server mainboard)
          o 64-bit, 3.3 V (most common on server mainboard before PCI-X appears)

In the above, you can see that 32 bit, 66 MHz card requires a 32 bit,
3.3 V slot which is different keyed, see (link from wikipedia)
http://www94.web.cern.ch/hsi/s-link/devices/s32pci64/slottypes.html; I
found this picture of the A7V8X-X board here
http://www.motherboard.cz/mb/asus/a7v8x-x_l.jpg and it clearly looks
like a standard 32bit/33MHz/5V slot.

I only know all this as I fought with a 32-bit, 66MHz SATA card a few
years ago hoping that I can get a consumer board that runs at 66MHz. No
luck though.

The fact that lspci shows you 66 MHz capability for some devices doesn't
mean the bus itself runs at 66 MHz. Also, as the bus is shared, you
would need *only* 66MHz devices for the bus to run as that speed.

I'm not an expert in this area, so I'm happy to be corrected if I'm
wrong. I personally never saw a 32bit/66MHz slot, and 64bit only on
server boards as PCI-X.

regards,
iustin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-29 20:25               ` Bill Davidsen
  2008-03-29 21:26                 ` Iustin Pop
@ 2008-03-30  8:55                 ` Keld Jørn Simonsen
  2008-03-30  9:34                   ` Keld Jørn Simonsen
  1 sibling, 1 reply; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-30  8:55 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Christian Pernegger, Linux RAID

On Sat, Mar 29, 2008 at 04:25:31PM -0400, Bill Davidsen wrote:
> Christian Pernegger wrote:
> 
> The md raid10,f2 generally has modest write performance, if U is a 
> single drive speed, write might range between 1.5U to (N-1)/2*U 
> depending on tuning. Read speed is almost always (N-1)*U, which is great 
> for many applications. Playing with chunk size, chunk buffers, etc, can 
> make a large difference in write performance.

Hmm, I have other formulae for this. raid10,f2 write speed would rather
be U*N/2, and read speed be U*N - possibly enhanced by also having
bigger chunks than on a regular non-raid disk, and enhanced by lower
access times. The formulae are bothe for sequential and random reads.

Best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-30  8:55                 ` Keld Jørn Simonsen
@ 2008-03-30  9:34                   ` Keld Jørn Simonsen
  2008-03-30 11:16                     ` Peter Grandi
  2008-03-30 14:21                     ` Keld Jørn Simonsen
  0 siblings, 2 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-30  9:34 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Christian Pernegger, Linux RAID

On Sun, Mar 30, 2008 at 10:55:28AM +0200, Keld Jørn Simonsen wrote:
> On Sat, Mar 29, 2008 at 04:25:31PM -0400, Bill Davidsen wrote:
> > Christian Pernegger wrote:
> > 
> > The md raid10,f2 generally has modest write performance, if U is a 
> > single drive speed, write might range between 1.5U to (N-1)/2*U 
> > depending on tuning. Read speed is almost always (N-1)*U, which is great 
> > for many applications. Playing with chunk size, chunk buffers, etc, can 
> > make a large difference in write performance.
> 
> Hmm, I have other formulae for this. raid10,f2 write speed would rather
> be U*N/2, and read speed be U*N - possibly enhanced by also having
> bigger chunks than on a regular non-raid disk, and enhanced by lower
> access times. The formulae are both for sequential and random reads.

And also faster transfer rates due to using the outer tracks of the
disk. This factor could amount to up to a factor of 2 when reading from
the high end of the array vs reading from the high end of the bare disk.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-30  9:34                   ` Keld Jørn Simonsen
@ 2008-03-30 11:16                     ` Peter Grandi
  2008-03-30 12:58                       ` Keld Jørn Simonsen
  2008-03-30 14:21                     ` Keld Jørn Simonsen
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Grandi @ 2008-03-30 11:16 UTC (permalink / raw)
  To: Linux RAID

[ ... ]

>>> The md raid10,f2 generally has modest write performance, if
>>> U is a single drive speed, write might range between 1.5U to
>>> (N-1)/2*U depending on tuning. Read speed is almost always
>>> (N-1)*U, which is great for many applications. Playing with
>>> chunk size, chunk buffers, etc, can make a large difference
>>> in write performance.

>> Hmm, I have other formulae for this. raid10,f2 write speed
>> would rather be U*N/2, and read speed be U*N - possibly
>> enhanced by also having bigger chunks than on a regular
>> non-raid disk, and enhanced by lower access times. The
>> formulae are both for sequential and random reads.

Well, that's very optimistic, because writing to different
halves of disks in a staggered way has two impacts.

For example as you say here to bottom "mirror" half of each disk
can be rather slower than the outer "read" half:

> And also faster transfer rates due to using the outer tracks
> of the disk. This factor could amount to up to a factor of 2
> when reading from the high end of the array vs reading from
> the high end of the bare disk.

But then for writing on RAID10 f2 writing to an outer and inner
half only reduces a little the surface write speed *across the
RAID10*: in RAID10 n2 write speed goes from say mx(80,80)MB/s to
max(40,40)MB/s as one writes each disk top to bottom, with an
average of 60MB/s, but on RAID10 f2 it goes from max(80,60)MB/s
to max(60,40)MB/s, or average 50MB/s.

In other words if one looks at the longitudinal (sequential)
speed, RAID10 f2 read speed is that of the first half, as you
write, but write speed is limited to that of the second half
(because in writing to both halves on must wait for both writes
to complete).

But write speed is not just longitudinal speed, and things get
complicated because of the different latitudes of writing,
involving seeking between inner and outer half on long writes.

RAID10 f2 in effect means "mirror the upper half of a disk onto
the lower half of the next disk".

Suppose then a write to chunk 0 and all disks are 250MB ones,
are at rest and their arms are on cylinder 0: the sequence of
block writes that make up the chunk write goes to both the upper
half of the first disk and to the lower half of the second disk
nearly simultaneously, and total time is

  max(
    (rotational latency+write 1 chunk at 80MB/s),
    (seek to cylinder 15200 + (rotational latency+write 1 chunk at 60MB/s))
  )

But now suppose that you are writing *two* chunks back-to-back,
the queue of requests on the first 3 disks will be:

   first:	write chunk 0 to cylinder 0

   second:	write chunk 0 to cylinder 15200
		write chunk 1 to cylinder 0

   third:	write chunk 1 to cylinder 15200

There is latitudinal interference between writing a mirror copy
of chunk 0 to the lower half of the second disk and the writing
immediately afterwards of the first copy of chunk 1 to the upper
half of the same disk.

Of course if you write many chunks, the situation that happens
here on the second disk will happen on all disks, and all disks
will be writing to some cylinder in the second half of each disk
and to 15200 cylinders above that.

The cost of each seek and how many seeks depend on the disk and
chuink size (as pointed out in the quote above) and how fast
write requests are issued and the interaction with the elevator;
for example I'd guess that 'anticipatory' is good with RAID10
f2, but given the unpleasant surprises with the rather demented
(to use a euphemism) queueing logic within Linux that would have
to be confirmed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-30 11:16                     ` Peter Grandi
@ 2008-03-30 12:58                       ` Keld Jørn Simonsen
  0 siblings, 0 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-30 12:58 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID

On Sun, Mar 30, 2008 at 12:16:59PM +0100, Peter Grandi wrote:
> [ ... ]
> 
> >>> The md raid10,f2 generally has modest write performance, if
> >>> U is a single drive speed, write might range between 1.5U to
> >>> (N-1)/2*U depending on tuning. Read speed is almost always
> >>> (N-1)*U, which is great for many applications. Playing with
> >>> chunk size, chunk buffers, etc, can make a large difference
> >>> in write performance.
> 
> >> Hmm, I have other formulae for this. raid10,f2 write speed
> >> would rather be U*N/2, and read speed be U*N - possibly
> >> enhanced by also having bigger chunks than on a regular
> >> non-raid disk, and enhanced by lower access times. The
> >> formulae are both for sequential and random reads.
> 
> Well, that's very optimistic, because writing to different
> halves of disks in a staggered way has two impacts.

Nonetheless, that is what my tests show, for writes.
Maybe the elevator saves me there. Have you got other figures?

But my test is done on a completely new partition, and thus 
I think that the test would tend to use the first sectors in the
partition, making this faster than average. This would most likely be the
same for other benchmarks.

> For example as you say here to bottom "mirror" half of each disk
> can be rather slower than the outer "read" half:
> 
> > And also faster transfer rates due to using the outer tracks
> > of the disk. This factor could amount to up to a factor of 2
> > when reading from the high end of the array vs reading from
> > the high end of the bare disk.
> 
> But then for writing on RAID10 f2 writing to an outer and inner
> half only reduces a little the surface write speed *across the
> RAID10*: in RAID10 n2 write speed goes from say mx(80,80)MB/s to
> max(40,40)MB/s as one writes each disk top to bottom, with an
> average of 60MB/s, but on RAID10 f2 it goes from max(80,60)MB/s
> to max(60,40)MB/s, or average 50MB/s.

I did explicitely say "when reading". I agree that the writing
speed would be more constant for f2 than n2, but it would also decline 
as the higher end sectors would be written.

For reading, I would estimate that the striped reading of only half the
disk with f2 layout will improve speed by about 17 % on average over the
whole array. This is a figure that is most likely constant over disk
size for 3.5 " disks, as it is dependent only of geometry, given the
fixed rotational speed, and the radius for the inner and outer tracks.
I measured this with badblocks over respectively a whole disk, and half
the whole disk, but you can probably get the same result by pure mathematics.

> In other words if one looks at the longitudinal (sequential)
> speed, RAID10 f2 read speed is that of the first half, as you
> write, but write speed is limited to that of the second half
> (because in writing to both halves one must wait for both writes
> to complete).

I don't think the kernel waits for both writes to complete before
doing the next write. It just puts the write blocks in buffers
for the elevator to pick up later. That is also the reason why
sequential writing in raid5 can be so fast.

> But write speed is not just longitudinal speed, and things get
> complicated because of the different latitudes of writing,
> involving seeking between inner and outer half on long writes.

My tests show that for random writes this kind of evens out between the
different raid types, approximating to a general random writing rate.
 
> RAID10 f2 in effect means "mirror the upper half of a disk onto
> the lower half of the next disk".

Yes.

> Suppose then a write to chunk 0 and all disks are 250MB ones,
> are at rest and their arms are on cylinder 0: the sequence of
> block writes that make up the chunk write goes to both the upper
> half of the first disk and to the lower half of the second disk
> nearly simultaneously, and total time is
> 
>   max(
>     (rotational latency+write 1 chunk at 80MB/s),
>     (seek to cylinder 15200 + (rotational latency+write 1 chunk at 60MB/s))
>   )
> 
> But now suppose that you are writing *two* chunks back-to-back,
> the queue of requests on the first 3 disks will be:
> 
>    first:	write chunk 0 to cylinder 0
> 
>    second:	write chunk 0 to cylinder 15200
> 		write chunk 1 to cylinder 0
> 
>    third:	write chunk 1 to cylinder 15200
> 
> There is latitudinal interference between writing a mirror copy
> of chunk 0 to the lower half of the second disk and the writing
> immediately afterwards of the first copy of chunk 1 to the upper
> half of the same disk.
> 
> Of course if you write many chunks, the situation that happens
> here on the second disk will happen on all disks, and all disks
> will be writing to some cylinder in the second half of each disk
> and to 15200 cylinders above that.
> 
> The cost of each seek and how many seeks depend on the disk and
> chunk size (as pointed out in the quote above) and how fast
> write requests are issued and the interaction with the elevator;
> for example I'd guess that 'anticipatory' is good with RAID10
> f2, but given the unpleasant surprises with the rather demented
> (to use a euphemism) queueing logic within Linux that would have
> to be confirmed.

Yes, you are right in your analyses, but fortunately the elevator saves
us, picking up a large number of writes each time, and thus minimizing
the effect of the latency problem, for raid10,f2 for sequential writing.

For random writing, I think this is random anyway, and it does not
matter much which layout you use.

Best regards
keld

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: raid10 vs raid5 - strange performance
  2008-03-30  9:34                   ` Keld Jørn Simonsen
  2008-03-30 11:16                     ` Peter Grandi
@ 2008-03-30 14:21                     ` Keld Jørn Simonsen
  1 sibling, 0 replies; 19+ messages in thread
From: Keld Jørn Simonsen @ 2008-03-30 14:21 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Christian Pernegger, Linux RAID

On Sun, Mar 30, 2008 at 11:34:20AM +0200, Keld Jørn Simonsen wrote:
> On Sun, Mar 30, 2008 at 10:55:28AM +0200, Keld Jørn Simonsen wrote:
> > On Sat, Mar 29, 2008 at 04:25:31PM -0400, Bill Davidsen wrote:
> > > Christian Pernegger wrote:
> > > 
> > > The md raid10,f2 generally has modest write performance, if U is a 
> > > single drive speed, write might range between 1.5U to (N-1)/2*U 
> > > depending on tuning. Read speed is almost always (N-1)*U, which is great 
> > > for many applications. Playing with chunk size, chunk buffers, etc, can 
> > > make a large difference in write performance.
> > 
> > Hmm, I have other formulae for this. raid10,f2 write speed would rather
> > be U*N/2, and read speed be U*N - possibly enhanced by also having
> > bigger chunks than on a regular non-raid disk, and enhanced by lower
> > access times. The formulae are both for sequential and random reads.
> 
> And also faster transfer rates due to using the outer tracks of the
> disk. This factor could amount to up to a factor of 2 when reading from
> the high end of the array vs reading from the high end of the bare disk.

The faster transfer rates for reading would amount to an improvement,
both for sequential and random reading, in average of about 17 % for
raid10,f2 - given that it confines its reading to the outer faster tracks
of the disks. For N>2 (f3, etc) this does not get much better.

The lover average seek times will also be geometrically determined (as
for the transfer rates as noted in another mail) and thus most likely
the speedup will be equal across disk sizes. The latency component would
for raid10,f2 improve slightly more than to the double (less than half the
average seek time). This will affect random reading. 

I did some calculations on the geometry of a CD - Which has its inner
tracks at a radius of 15 mm, and its outer tracks at the radius of 59 mm
(approximately). Because there is more data in the outer tracks, you
need less tracks to make up for half the size of the disk, and the range
of the head movements are thus less than half of the original head span.
For a CD the head movements to cover half of the data on the outer part
would only be 16 mm, compared to 44 mm to cover the whole CD size from
inner to outer tracks. This is a littele less than a third, which should 
amount to the same reduction in latency time. Given that there is less
data in the inner tracks, the improvement is somewhat offset by the
reduced probability that inner tracks will be read. I am sure that this
can be calculated on a strict geometric base, but I could not find the
formulae. Anyway something like a 2-3 times improvement in latency 
would be expected. And then some disks do not follow the simple mapping
between logical sectors and physical layout.

Average access time will improve for N>2, eg raid10,f3 will have about
50 % improved latency compared to raid10,f2, and raid10,f4 will only
have around half the latency of raid10,f2.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-03-30 14:21 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-25 21:33 raid10 vs raid5 - strange performance Christian Pernegger
2008-03-25 22:13 ` David Rees
2008-03-25 22:56   ` Christian Pernegger
2008-03-26 16:29     ` Bill Davidsen
2008-03-26 17:10       ` Christian Pernegger
2008-03-26 17:49         ` Keld Jørn Simonsen
2008-03-25 23:36 ` Keld Jørn Simonsen
     [not found]   ` <bb145bd20803251837x7ef1fa9dk6191fcaea7b02144@mail.gmail.com>
     [not found]     ` <20080326072416.GA8674@rap.rap.dk>
2008-03-26 17:16       ` Christian Pernegger
2008-03-26 19:29         ` Keld Jørn Simonsen
2008-03-27  1:11           ` Christian Pernegger
2008-03-27  9:18             ` Keld Jørn Simonsen
     [not found]           ` <47EAACAB.7070203@tmr.com>
2008-03-27  2:02             ` Christian Pernegger
2008-03-29 20:25               ` Bill Davidsen
2008-03-29 21:26                 ` Iustin Pop
2008-03-30  8:55                 ` Keld Jørn Simonsen
2008-03-30  9:34                   ` Keld Jørn Simonsen
2008-03-30 11:16                     ` Peter Grandi
2008-03-30 12:58                       ` Keld Jørn Simonsen
2008-03-30 14:21                     ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).