raind-1 resync speed slow down to 50% by the time it finishes

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raind-1 resync speed slow down to 50% by the time it finishes
@ 2009-07-30  6:25 Tirumala Reddy Marri
  2009-07-30  7:35 ` Robin Hill
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Tirumala Reddy Marri @ 2009-07-30  6:25 UTC (permalink / raw)
  To: linux-raid

Hi,
  I have two 1 tera byte disks in RAID-1 configuration. When I started
RAID-1 array initial speed was 100MBps by the time it finishes the speed
was <50MBps. Is there is any reason for this behavior ? Isn't speed
supposedly uniform. 
Thanks,
marri

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30  6:25 raind-1 resync speed slow down to 50% by the time it finishes Tirumala Reddy Marri
@ 2009-07-30  7:35 ` Robin Hill
  2009-07-30 10:18   ` Keld Jørn Simonsen
  2009-07-30  8:44 ` Mikael Abrahamsson
  2009-07-30 18:35 ` Tracy Reed
  2 siblings, 1 reply; 20+ messages in thread
From: Robin Hill @ 2009-07-30  7:35 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On Wed Jul 29, 2009 at 11:25:47PM -0700, Tirumala Reddy Marri wrote:

> Hi,
>   I have two 1 tera byte disks in RAID-1 configuration. When I started
> RAID-1 array initial speed was 100MBps by the time it finishes the speed
> was <50MBps. Is there is any reason for this behavior ? Isn't speed
> supposedly uniform. 
> 
No, the speed isn't uniform - it varies across the disk.  The
_rotational_ speed is fixed (probably 7200 rpm), but that means the
outer tracks are passing at a higher _linear_ speed (i.e. in a single
rotation, there's more disk passing across the read head), so have a
higher transfer rate.  Hard drives start writing from the outside, so
the speed drops off as you progress.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30  6:25 raind-1 resync speed slow down to 50% by the time it finishes Tirumala Reddy Marri
  2009-07-30  7:35 ` Robin Hill
@ 2009-07-30  8:44 ` Mikael Abrahamsson
  2009-07-30 18:35 ` Tracy Reed
  2 siblings, 0 replies; 20+ messages in thread
From: Mikael Abrahamsson @ 2009-07-30  8:44 UTC (permalink / raw)
  To: Tirumala Reddy Marri; +Cc: linux-raid

On Wed, 29 Jul 2009, Tirumala Reddy Marri wrote:

> Hi,
>  I have two 1 tera byte disks in RAID-1 configuration. When I started
> RAID-1 array initial speed was 100MBps by the time it finishes the speed
> was <50MBps. Is there is any reason for this behavior ? Isn't speed
> supposedly uniform.

No, as the reading heads reaches the inner part of the discs, transfer 
speed goes down.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30  7:35 ` Robin Hill
@ 2009-07-30 10:18   ` Keld Jørn Simonsen
  2009-07-30 20:11     ` David Rees
  0 siblings, 1 reply; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-07-30 10:18 UTC (permalink / raw)
  To: linux-raid

On Thu, Jul 30, 2009 at 08:35:54AM +0100, Robin Hill wrote:
> On Wed Jul 29, 2009 at 11:25:47PM -0700, Tirumala Reddy Marri wrote:
> 
> > Hi,
> >   I have two 1 tera byte disks in RAID-1 configuration. When I started
> > RAID-1 array initial speed was 100MBps by the time it finishes the speed
> > was <50MBps. Is there is any reason for this behavior ? Isn't speed
> > supposedly uniform. 
> > 
> No, the speed isn't uniform - it varies across the disk.  The
> _rotational_ speed is fixed (probably 7200 rpm), but that means the
> outer tracks are passing at a higher _linear_ speed (i.e. in a single
> rotation, there's more disk passing across the read head), so have a
> higher transfer rate.  Hard drives start writing from the outside, so
> the speed drops off as you progress.

there is a raid type which can be seen as a raid1 version, but avoids
some of the  performance problems of raid1. this is rai10,f2, which
performs like raid0 for reading, and for reading only uses the faster
half of the disks, thus not degrading as much as raid1.

I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
as 50 %. For writing it is about the same, given that you use a file
system on top of the raid. 


best regards
Keld

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30  6:25 raind-1 resync speed slow down to 50% by the time it finishes Tirumala Reddy Marri
  2009-07-30  7:35 ` Robin Hill
  2009-07-30  8:44 ` Mikael Abrahamsson
@ 2009-07-30 18:35 ` Tracy Reed
  2009-07-30 20:28   ` David Rees
  2 siblings, 1 reply; 20+ messages in thread
From: Tracy Reed @ 2009-07-30 18:35 UTC (permalink / raw)
  To: Tirumala Reddy Marri; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On Wed, Jul 29, 2009 at 11:25:47PM -0700, Tirumala Reddy Marri spake thusly:
>   I have two 1 tera byte disks in RAID-1 configuration. When I started
> RAID-1 array initial speed was 100MBps by the time it finishes the speed
> was <50MBps. Is there is any reason for this behavior ? Isn't speed
> supposedly uniform. 

I currently have a RAID-1 rebuild running and cat /proc/mdstat shows
speed=1481K/sec. This seems incredibly slow to me. That's just just
about one and a half megs per second. I'm wondering if this machine
has some more serious hardware problem. No errors showing up in dmesg
or /var/log/messages or anything though.

-- 
Tracy Reed
http://tracyreed.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30 10:18   ` Keld Jørn Simonsen
@ 2009-07-30 20:11     ` David Rees
  2009-07-31 17:54       ` Keld Jørn Simonsen
  0 siblings, 1 reply; 20+ messages in thread
From: David Rees @ 2009-07-30 20:11 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

2009/7/30 Keld Jørn Simonsen <keld@dkuug.dk>:
> I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
> as 50 %. For writing it is about the same, given that you use a file
> system on top of the raid.

Has anyone done any benchmarks of near vs far setups?

From what I understand, here's how performance should go for a 2-disk
raid10 setup:

Streaming/large reads far: Up to 100% faster since reads are striped
across both disks
Streaming/large reads near: Same as single disk as reads can't be
striped across both disks

Streaming/large writes far: Slower than single disk, since disks have
to seek to write.  How much of a hit in performance will depend on
chunk size.
Streaming/large writes near: Same as single disk.

Random/small reads far: Up to 100% faster
Random/small reads near: Up to 100% faster

Random/small writes far: Same as single disk.
Random/small writes near: Same as single disk.

So basically, if you have a setup which mostly reads from disk, using
a far layout is beneficial, but if you have a setup which does a
higher percentage of writes, sticking to a near layout will be faster.

I recently set up an 8-disk RAID10 across 8 7200 disks across 3 controllers.

5 disks are in an external enclosure via eSATA and a PCIe card.
2 disks are using onboard SATA controller
1 disk is using onboard IDE controller

I debated whether or not to use near or far, but ultimately stuck with
near for two reasons:

1. The array mostly sees write activity, streaming reads aren't that common.
2. I can only get about 120 MB/s out of the external enclosure because
of the PCIe card [1] , so being able to stripe reads wouldn't help get
any extra performance out of those disks.

-Dave

[1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30 18:35 ` Tracy Reed
@ 2009-07-30 20:28   ` David Rees
  0 siblings, 0 replies; 20+ messages in thread
From: David Rees @ 2009-07-30 20:28 UTC (permalink / raw)
  To: Tracy Reed; +Cc: Tirumala Reddy Marri, linux-raid

On Thu, Jul 30, 2009 at 11:35 AM, Tracy Reed<treed@ultraviolet.org> wrote:
> On Wed, Jul 29, 2009 at 11:25:47PM -0700, Tirumala Reddy Marri spake thusly:
>>   I have two 1 tera byte disks in RAID-1 configuration. When I started
>> RAID-1 array initial speed was 100MBps by the time it finishes the speed
>> was <50MBps. Is there is any reason for this behavior ? Isn't speed
>> supposedly uniform.
>
> I currently have a RAID-1 rebuild running and cat /proc/mdstat shows
> speed=1481K/sec. This seems incredibly slow to me. That's just just
> about one and a half megs per second. I'm wondering if this machine
> has some more serious hardware problem. No errors showing up in dmesg
> or /var/log/messages or anything though.

If there is any read/write activity on the array, resync speeds will
slow down significantly as it tries to minimize the effect on system
performance by default.  You can try upping the min sync speed through
/proc to increase the sync speed, but array performance will suffer
more.

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-30 20:11     ` David Rees
@ 2009-07-31 17:54       ` Keld Jørn Simonsen
  2009-07-31 18:10         ` Keld Jørn Simonsen
  2009-07-31 20:10         ` David Rees
  0 siblings, 2 replies; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-07-31 17:54 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

On Thu, Jul 30, 2009 at 01:11:20PM -0700, David Rees wrote:
> 2009/7/30 Keld Jørn Simonsen <keld@dkuug.dk>:
> > I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
> > as 50 %. For writing it is about the same, given that you use a file
> > system on top of the raid.
> 
> Has anyone done any benchmarks of near vs far setups?

Yes, there are a number of benchmarks on raid10 near/far scenarios
at http://linux-raid.osdl.org/index.php/Performance

> >From what I understand, here's how performance should go for a 2-disk
> raid10 setup:
> 
> Streaming/large reads far: Up to 100% faster since reads are striped
> across both disks

and possibly faster, due to far only using the faster half of the disk
for reading.

> Streaming/large reads near: Same as single disk as reads can't be
> striped across both disks

yes.

> Streaming/large writes far: Slower than single disk, since disks have
> to seek to write.  How much of a hit in performance will depend on
> chunk size.
> Streaming/large writes near: Same as single disk.

Due to the elevator of the file system, writes are about the same for
both near and far.

> Random/small reads far: Up to 100% faster

Actually a bit more, due to that far only uses the fastest half of the
disks. One test shows 132 % faster, which is consistent with theory.

> Random/small reads near: Up to 100% faster

One test shows 156 % faster.

> Random/small writes far: Same as single disk.
> Random/small writes near: Same as single disk.

yes.

> So basically, if you have a setup which mostly reads from disk, using
> a far layout is beneficial, but if you have a setup which does a
> higher percentage of writes, sticking to a near layout will be faster.

For reading, this is true, but for writing, it is not true, given that
you use a filesystem, with an elevator algorithm in use. The elevator
evens out the lesser performance of layout=far for a raw raid10,f2, so
that the performance is about the same for the near and far layouts.

> I recently set up an 8-disk RAID10 across 8 7200 disks across 3 controllers.
> 
> 5 disks are in an external enclosure via eSATA and a PCIe card.
> 2 disks are using onboard SATA controller
> 1 disk is using onboard IDE controller
> 
> I debated whether or not to use near or far, but ultimately stuck with
> near for two reasons:
> 
> 1. The array mostly sees write activity, streaming reads aren't that common.
> 2. I can only get about 120 MB/s out of the external enclosure because
> of the PCIe card [1] , so being able to stripe reads wouldn't help get
> any extra performance out of those disks.

Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s. 
Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
That is the speed of a PCI 32 bit bus. I looked at your reference [1]
for the 3132 model. Have you tried it out in practice?

The max you should be able to get out of your raid10 with 8 disks would
then be around 400 - 480 MB/s, for sequential reads. 250 MB/s out of your PCIE
enclosure, or 50 MB/s per disk, and then additional 50 MB/s each of the last
3 disks. You can only multiply the speed of the slowest of the disks
involved by the number of disks. But even then it is not so bad.
For random read it is better yet, given that this is not limited by the
transfer speed of your PCIe controller.

> -Dave
> 
> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-31 17:54       ` Keld Jørn Simonsen
@ 2009-07-31 18:10         ` Keld Jørn Simonsen
  2009-07-31 20:10         ` David Rees
  1 sibling, 0 replies; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-07-31 18:10 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

On Fri, Jul 31, 2009 at 07:54:55PM +0200, Keld Jørn Simonsen wrote:
> On Thu, Jul 30, 2009 at 01:11:20PM -0700, David Rees wrote:
> > 2009/7/30 Keld Jørn Simonsen <keld@dkuug.dk>:
> > > I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
> > > as 50 %. For writing it is about the same, given that you use a file
> > > system on top of the raid.
> > 
> 
> > Random/small reads far: Up to 100% faster
> 
> Actually a bit more, due to that far only uses the fastest half of the
> disks. One test shows 132 % faster, which is consistent with theory.
> 
> > Random/small reads near: Up to 100% faster
> 
> One test shows 156 % faster.

I meant 56 % faster.

So one test (done by myself) shows far to be 132 % faster than single
disk, and near to be 56 % faster. Given the behaviour of near and far I
believe the tests to be representative of the near/far performance for
random reading.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-31 17:54       ` Keld Jørn Simonsen
  2009-07-31 18:10         ` Keld Jørn Simonsen
@ 2009-07-31 20:10         ` David Rees
  2009-08-01 13:00           ` Keld Jørn Simonsen
  1 sibling, 1 reply; 20+ messages in thread
From: David Rees @ 2009-07-31 20:10 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

2009/7/31 Keld Jørn Simonsen <keld@dkuug.dk>:
> On Thu, Jul 30, 2009 at 01:11:20PM -0700, David Rees wrote:
>> 2009/7/30 Keld Jørn Simonsen <keld@dkuug.dk>:
>> > I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
>> > as 50 %. For writing it is about the same, given that you use a file
>> > system on top of the raid.
>>
>> Has anyone done any benchmarks of near vs far setups?
>
> Yes, there are a number of benchmarks on raid10 near/far scenarios
> at http://linux-raid.osdl.org/index.php/Performance

Hmm, don't know how I missed those!  Thanks!

>> >From what I understand, here's how performance should go for a 2-disk
>> raid10 setup:
>>
>> Streaming/large reads far: Up to 100% faster since reads are striped
>> across both disks
>
> and possibly faster, due to far only using the faster half of the disk
> for reading.

How is it possible to go faster than 2x faster?  If the system only
reads from the disk that has the data on the faster half of the disk,
you can't stripe the reads, so you won't see a significant increase in
speed.

Let's use some data from a real disk, the Velociraptor and a 2-disk
array and streaming reads/writes.  At the beginning of the disk you
can read about 120MB/s.  At the end of the disk, you can read about
80MB/s.

Data on the "beginning" of array, RAID0 = 240MB/s
Data on the "end" of array, RAID0 = 160MB/s.
Data on the "beginning" of array, RAID10,n2 = 120MB/s
Data on the "end" of array, RAID10,n2 = 80MB/s.
Data on the "beginning" of array, RAID10,f2 = 200MB/s
Data on the "end" of array, RAID10,f2 = 200MB/s.

With a f2 setup you'll read at something less than 120+80 = 200MB/s.
So I guess it's a bit more than 100% faster than 80MB/s, in that
situation, but you get less than the peak performance of 240MB/s, so
I'd still call it on average, 100% faster.  It may be 120% faster in
some situations, but only 80% faster in others.

>> Streaming/large writes far: Slower than single disk, since disks have
>> to seek to write.  How much of a hit in performance will depend on
>> chunk size.
>> Streaming/large writes near: Same as single disk.
>
> Due to the elevator of the file system, writes are about the same for
> both near and far.

Your benchmarks showed about a 13% performance hit for f2 compared to
n2 and RAID1, so I wouldn't quite call it the same.  Close, but still
noticeably slower.

>> Random/small reads far: Up to 100% faster
>
> Actually a bit more, due to that far only uses the fastest half of the
> disks. One test shows 132 % faster, which is consistent with theory.

I doubt that is the case on average.  Best case, yes.  Worst case, no.
 I guess I should have said "appx 100% faster" instead of Up to 100%
faster.  So we're both right. :-)

>> 1. The array mostly sees write activity, streaming reads aren't that common.
>> 2. I can only get about 120 MB/s out of the external enclosure because
>> of the PCIe card [1] , so being able to stripe reads wouldn't help get
>> any extra performance out of those disks.
>> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
>
> Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
> Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
> That is the speed of a PCI 32 bit bus. I looked at your reference [1]
> for the 3132 model. Have you tried it out in practice?

Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
ran dd read/write tests on individual disks and found that overall
throughput peaked exactly at 120MB/s.

> The max you should be able to get out of your raid10 with 8 disks would
> then be around 400 - 480 MB/s, for sequential reads. 250 MB/s out of your PCIE
> enclosure, or 50 MB/s per disk, and then additional 50 MB/s each of the last
> 3 disks. You can only multiply the speed of the slowest of the disks
> involved by the number of disks. But even then it is not so bad.
> For random read it is better yet, given that this is not limited by the
> transfer speed of your PCIe controller.

For streaming reads/writes, I have found am limited by the average
speed of each disk the array.  Because I am limited to 120 MB/s on the
5-disk enclosure, for writes I'm limited to about 80 MB/s.  For reads
which only have to come from half the disks, I am able to get up to
180 MB/s out of the array. (I did have to use blockdev --setra
/dev/md0 to increase the readahead size to at least 16MB to get those
numbers).

But the primary reason I built it was to handle lots of small random
writes/reads, so being limited to 120MB/s out of the enclosure isn't
noticeable most of the time in practice as you say.

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-07-31 20:10         ` David Rees
@ 2009-08-01 13:00           ` Keld Jørn Simonsen
  2009-08-01 15:13             ` David Rees
  0 siblings, 1 reply; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-08-01 13:00 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote:
> 2009/7/31 Keld Jørn Simonsen <keld@dkuug.dk>:
> > On Thu, Jul 30, 2009 at 01:11:20PM -0700, David Rees wrote:
> >> 2009/7/30 Keld Jørn Simonsen <keld@dkuug.dk>:
> >> > I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
> >> > as 50 %. For writing it is about the same, given that you use a file
> >> > system on top of the raid.
> >>
> >> Has anyone done any benchmarks of near vs far setups?
> >
> > Yes, there are a number of benchmarks on raid10 near/far scenarios
> > at http://linux-raid.osdl.org/index.php/Performance
> 
> Hmm, don't know how I missed those!  Thanks!
> 
> >> >From what I understand, here's how performance should go for a 2-disk
> >> raid10 setup:
> >>
> >> Streaming/large reads far: Up to 100% faster since reads are striped
> >> across both disks
> >
> > and possibly faster, due to far only using the faster half of the disk
> > for reading.
> 
> How is it possible to go faster than 2x faster?  If the system only
> reads from the disk that has the data on the faster half of the disk,
> you can't stripe the reads, so you won't see a significant increase in
> speed.
> 
> Let's use some data from a real disk, the Velociraptor and a 2-disk
> array and streaming reads/writes.  At the beginning of the disk you
> can read about 120MB/s.  At the end of the disk, you can read about
> 80MB/s.

This is not actual figures from some benchmarking you did, true?

> Data on the "beginning" of array, RAID0 = 240MB/s
> Data on the "end" of array, RAID0 = 160MB/s.
> Data on the "beginning" of array, RAID10,n2 = 120MB/s
> Data on the "end" of array, RAID10,n2 = 80MB/s.
> Data on the "beginning" of array, RAID10,f2 = 200MB/s

Should be:

Data on the "beginning" of array, RAID10,f2 = 230MB/s

You can get about 95 % of the theoretical max out of raid10,f2,
according to a number of tests.

> Data on the "end" of array, RAID10,f2 = 200MB/s.

Yes, the "end" here is at the half of the disk, so 200 MB/s is likely
for raid10,f2.

> With a f2 setup you'll read at something less than 120+80 = 200MB/s.

When? at the beginning or the end?

> So I guess it's a bit more than 100% faster than 80MB/s, in that
> situation, but you get less than the peak performance of 240MB/s, so
> I'd still call it on average, 100% faster.  It may be 120% faster in
> some situations, but only 80% faster in others.

I was talking of random read speeds, not sequential read speeds. 

Random read performance on a single disk, in one test, was 34 MB/s while
seq read on same disk was 82. In raid10,f2 with 2 disks random read was
79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is 
a likely result as you should expect a doubling in speed from the 2
disks, and then som additional speed from the faster sectors of the
outer disks, and then the shorter access times on the oyuter disk
sectors. Geometry says that on average the transfer speeds are 17 %
shorter on the outer half part of the disk, compared to the whole disk. 
So that gives some 235 % speed improvement (2 * 1.17). The head
improvement should also give a little, but maybe the elevator algorithm
of the file system eliminates most of that factor.


> >> Streaming/large writes far: Slower than single disk, since disks have
> >> to seek to write.  How much of a hit in performance will depend on
> >> chunk size.
> >> Streaming/large writes near: Same as single disk.
> >
> > Due to the elevator of the file system, writes are about the same for
> > both near and far.
> 
> Your benchmarks showed about a 13% performance hit for f2 compared to
> n2 and RAID1, so I wouldn't quite call it the same.  Close, but still
> noticeably slower.

Nah, for random write, MB/s:

raid1         55
raid10,n2     48
raid10,f2     55

So raid10,f2 and raid1 are the same, raid10,n2 13 % slower.
In theory the elevator should even this out for all mirrored raid types.
Single disk speed and raid1 and raid10,f2 speeds were identical, as
theory also would have it, for random writes.

> 
> >> Random/small reads far: Up to 100% faster
> >
> > Actually a bit more, due to that far only uses the fastest half of the
> > disks. One test shows 132 % faster, which is consistent with theory.
> 
> I doubt that is the case on average.  Best case, yes.  Worst case, no.
>  I guess I should have said "appx 100% faster" instead of Up to 100%
> faster.  So we're both right. :-)

I would claim that 132 % is consistent with theory, as explained above.
And as this is based on pure geometry, on a 3.5 " disk with standard
inner and outer radius, the figure is a general fixed result.

> >> 1. The array mostly sees write activity, streaming reads aren't that common.
> >> 2. I can only get about 120 MB/s out of the external enclosure because
> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get
> >> any extra performance out of those disks.
> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> >
> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
> > for the 3132 model. Have you tried it out in practice?
> 
> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
> ran dd read/write tests on individual disks and found that overall
> throughput peaked exactly at 120MB/s.

Hmm, get another controller, then. A cheap PCIe contoller should be able
to do about 300 MB/s on a x1 PCIe.
> 
> > The max you should be able to get out of your raid10 with 8 disks would
> > then be around 400 - 480 MB/s, for sequential reads. 250 MB/s out of your PCIE
> > enclosure, or 50 MB/s per disk, and then additional 50 MB/s each of the last
> > 3 disks. You can only multiply the speed of the slowest of the disks
> > involved by the number of disks. But even then it is not so bad.
> > For random read it is better yet, given that this is not limited by the
> > transfer speed of your PCIe controller.
> 
> For streaming reads/writes, I have found am limited by the average
> speed of each disk the array.  Because I am limited to 120 MB/s on the
> 5-disk enclosure, for writes I'm limited to about 80 MB/s.  For reads
> which only have to come from half the disks, I am able to get up to
> 180 MB/s out of the array.

> (I did have to use blockdev --setra
> /dev/md0 to increase the readahead size to at least 16MB to get those
> numbers).

yes, this is a common trick.

> But the primary reason I built it was to handle lots of small random
> writes/reads, so being limited to 120MB/s out of the enclosure isn't
> noticeable most of the time in practice as you say.

Yes, for random read/write you only get something like 45 % out of the
max transfer bandwidth. So 120 MB/s would be close to the max that your
5 disks on the PCIe controller can deliver. With a faster PCIe
controller you should be able to get better performance on random reads
with raid10,f2. Anyway 180 MB/s may be fast enough for your application.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-01 13:00           ` Keld Jørn Simonsen
@ 2009-08-01 15:13             ` David Rees
  2009-08-01 17:57               ` Keld Jørn Simonsen
  0 siblings, 1 reply; 20+ messages in thread
From: David Rees @ 2009-08-01 15:13 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

2009/8/1 Keld Jørn Simonsen <keld@dkuug.dk>:
> On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote:
>> Let's use some data from a real disk, the Velociraptor and a 2-disk
>> array and streaming reads/writes.  At the beginning of the disk you
>> can read about 120MB/s.  At the end of the disk, you can read about
>> 80MB/s.
>
> This is not actual figures from some benchmarking you did, true?

Those are actual numbers from a Velociraptor, but the numbers are just
estimates.

>> Data on the "beginning" of array, RAID0 = 240MB/s
>> Data on the "end" of array, RAID0 = 160MB/s.
>> Data on the "beginning" of array, RAID10,n2 = 120MB/s
>> Data on the "end" of array, RAID10,n2 = 80MB/s.
>> Data on the "beginning" of array, RAID10,f2 = 200MB/s
>
> Should be:
>
> Data on the "beginning" of array, RAID10,f2 = 230MB/s

No - you're getting 120 MB/s from one disk and 80MB/s from another.
How that would add up to 230MB/s defies logic...

>> With a f2 setup you'll read at something less than 120+80 = 200MB/s.
>
> When? at the beginning or the end?

The whole thing, on average.  But the whole point of f2 is to even out
performance from beginning of the array and let you stripe reads.

> Random read performance on a single disk, in one test, was 34 MB/s while
> seq read on same disk was 82. In raid10,f2 with 2 disks random read was
> 79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is
> a likely result as you should expect a doubling in speed from the 2
> disks, and then som additional speed from the faster sectors of the
> outer disks, and then the shorter access times on the oyuter disk
> sectors. Geometry says that on average the transfer speeds are 17 %
> shorter on the outer half part of the disk, compared to the whole disk.
> So that gives some 235 % speed improvement (2 * 1.17). The head
> improvement should also give a little, but maybe the elevator algorithm
> of the file system eliminates most of that factor.

Sorry - I'm having a hard time wrapping my head around that you can
simply ignore access to the slow half the disk in a multi-threaded
random IO test.  The only way I might believe that you can get 235%
improvement is in a single threaded test with a queue depth of 1 which
lets the f2 setup only use the fast half the disks.  If that is your
assumption, then, OK.  But then getting 34MB/s isn't out of a rotating
disk isn't random IO, either.  Random IO on a rotating disk is
normally an order of magnitude slower.

>> Your benchmarks showed about a 13% performance hit for f2 compared to
>> n2 and RAID1, so I wouldn't quite call it the same.  Close, but still
>> noticeably slower.
>
> Nah, for random write, MB/s:
>
> raid1         55
> raid10,n2     48
> raid10,f2     55

Sorry - must have read them wrong again.

>> >> 1. The array mostly sees write activity, streaming reads aren't that common.
>> >> 2. I can only get about 120 MB/s out of the external enclosure because
>> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get
>> >> any extra performance out of those disks.
>> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
>> >
>> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
>> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
>> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
>> > for the 3132 model. Have you tried it out in practice?
>>
>> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
>> ran dd read/write tests on individual disks and found that overall
>> throughput peaked exactly at 120MB/s.
>
> Hmm, get another controller, then. A cheap PCIe contoller should be able
> to do about 300 MB/s on a x1 PCIe.

Please read my reference again.  It's a motherboard limitation.  I
already _have_ a good, cheap PCIe controller.

>> But the primary reason I built it was to handle lots of small random
>> writes/reads, so being limited to 120MB/s out of the enclosure isn't
>> noticeable most of the time in practice as you say.
>
> Yes, for random read/write you only get something like 45 % out of the
> max transfer bandwidth. So 120 MB/s would be close to the max that your
> 5 disks on the PCIe controller can deliver. With a faster PCIe
> controller you should be able to get better performance on random reads
> with raid10,f2. Anyway 180 MB/s may be fast enough for your application.

Again - your idea of "random" IO is completely different than mine.
My random IO workloads can only get a couple MB/s out of a single
disk.

Here's a benchmark which tests SSDs and rotational disks.  All the
rotational disks are getting less than 1MB/s in the random IO test.
http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=25  It's a
worst case scenario, but not far from my workloads which obviously
read a bit more data on each read.

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-01 15:13             ` David Rees
@ 2009-08-01 17:57               ` Keld Jørn Simonsen
  2009-08-04 22:21                 ` David Rees
  0 siblings, 1 reply; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-08-01 17:57 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

On Sat, Aug 01, 2009 at 08:13:45AM -0700, David Rees wrote:
> 2009/8/1 Keld Jørn Simonsen <keld@dkuug.dk>:
> > On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote:
> >> Let's use some data from a real disk, the Velociraptor and a 2-disk
> >> array and streaming reads/writes.  At the beginning of the disk you
> >> can read about 120MB/s.  At the end of the disk, you can read about
> >> 80MB/s.
> >
> > This is not actual figures from some benchmarking you did, true?
> 
> Those are actual numbers from a Velociraptor, but the numbers are just
> estimates.
> 
> >> Data on the "beginning" of array, RAID0 = 240MB/s
> >> Data on the "end" of array, RAID0 = 160MB/s.
> >> Data on the "beginning" of array, RAID10,n2 = 120MB/s
> >> Data on the "end" of array, RAID10,n2 = 80MB/s.
> >> Data on the "beginning" of array, RAID10,f2 = 200MB/s
> >
> > Should be:
> >
> > Data on the "beginning" of array, RAID10,f2 = 230MB/s
> 
> No - you're getting 120 MB/s from one disk and 80MB/s from another.
> How that would add up to 230MB/s defies logic...

Why only 80 MB/ when reading? reading from both disks with raid10,f2 are done at the
beginning of both disks, thus getting about 115 MB/s from both of them.

> >> With a f2 setup you'll read at something less than 120+80 = 200MB/s.
> >
> > When? at the beginning or the end?
> 
> The whole thing, on average.  But the whole point of f2 is to even out
> performance from beginning of the array and let you stripe reads.
> 
> > Random read performance on a single disk, in one test, was 34 MB/s while
> > seq read on same disk was 82. In raid10,f2 with 2 disks random read was
> > 79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is
> > a likely result as you should expect a doubling in speed from the 2
> > disks, and then som additional speed from the faster sectors of the
> > outer disks, and then the shorter access times on the oyuter disk
> > sectors. Geometry says that on average the transfer speeds are 17 %
> > shorter on the outer half part of the disk, compared to the whole disk.
> > So that gives some 235 % speed improvement (2 * 1.17). The head
> > improvement should also give a little, but maybe the elevator algorithm
> > of the file system eliminates most of that factor.
> 
> Sorry - I'm having a hard time wrapping my head around that you can
> simply ignore access to the slow half the disk in a multi-threaded
> random IO test. 

reading in raid10,f2 is restricted to the faster half of the disk, by
design.

It is different when writing. there both halves, fast and slow, are
used.

> The only way I might believe that you can get 235%
> improvement is in a single threaded test with a queue depth of 1 which
> lets the f2 setup only use the fast half the disks. 

The test was for a multi-threaded test, with many processes running, say
about 200 processes. The test was set up to mimick a ftp mirror.

> If that is your
> assumption, then, OK.  But then getting 34MB/s isn't out of a rotating
> disk isn't random IO, either.  Random IO on a rotating disk is
> normally an order of magnitude slower.

Agreed. The 34 MB/s is random io in a multi-thread environment. and an elevator
algorithm is in operation. 

If you only do the individual random reading in a single thread, it
would be much slower. However, the same speedups will occur for
raid10,f2. There will be a double up from reading from 2 disks at the
same time, and only using the faster half of the disks will both make a
better overall transfer rate, and quicker access times.

> >> >> 1. The array mostly sees write activity, streaming reads aren't that common.
> >> >> 2. I can only get about 120 MB/s out of the external enclosure because
> >> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get
> >> >> any extra performance out of those disks.
> >> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> >> >
> >> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
> >> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
> >> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
> >> > for the 3132 model. Have you tried it out in practice?
> >>
> >> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
> >> ran dd read/write tests on individual disks and found that overall
> >> throughput peaked exactly at 120MB/s.
> >
> > Hmm, get another controller, then. A cheap PCIe contoller should be able
> > to do about 300 MB/s on a x1 PCIe.
> 
> Please read my reference again.  It's a motherboard limitation.  I
> already _have_ a good, cheap PCIe controller.

OK, I read:
[1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
as being the description of the PCIe controller, especially SIL 3132 -
the PCIe controller. And that this was restricted to 120 MB/s - not the
mobo. Anuway, yuo could get a new mobo, they are  cheap these days and
many of them come with either 4 or 8 SATA interfaces. If you have bought
Velociraptors then it must be for the speed, and quite cheap mobos could
enhance your performance considerably.

> >> But the primary reason I built it was to handle lots of small random
> >> writes/reads, so being limited to 120MB/s out of the enclosure isn't
> >> noticeable most of the time in practice as you say.
> >
> > Yes, for random read/write you only get something like 45 % out of the
> > max transfer bandwidth. So 120 MB/s would be close to the max that your
> > 5 disks on the PCIe controller can deliver. With a faster PCIe
> > controller you should be able to get better performance on random reads
> > with raid10,f2. Anyway 180 MB/s may be fast enough for your application.
> 
> Again - your idea of "random" IO is completely different than mine.
> My random IO workloads can only get a couple MB/s out of a single
> disk.

yes, it seems we have different usage scenarios. I am serving reasonably
big files, say 700 MB ISO images, or .rpm packages of several MBs, you are
probably doing some database access.

> Here's a benchmark which tests SSDs and rotational disks.  All the
> rotational disks are getting less than 1MB/s in the random IO test.
> http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=25  It's a
> worst case scenario, but not far from my workloads which obviously
> read a bit more data on each read.

What are your average read or write block sizes? Is it some database
usage?

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-01 17:57               ` Keld Jørn Simonsen
@ 2009-08-04 22:21                 ` David Rees
  2009-08-04 23:18                   ` John Robinson
  2009-08-05  7:44                   ` Keld Jørn Simonsen
  0 siblings, 2 replies; 20+ messages in thread
From: David Rees @ 2009-08-04 22:21 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

2009/8/1 Keld Jørn Simonsen <keld@dkuug.dk>:
> On Sat, Aug 01, 2009 at 08:13:45AM -0700, David Rees wrote:
>> No - you're getting 120 MB/s from one disk and 80MB/s from another.
>> How that would add up to 230MB/s defies logic...
>
> Why only 80 MB/ when reading? reading from both disks with raid10,f2 are done at the
> beginning of both disks, thus getting about 115 MB/s from both of them.
>
> reading in raid10,f2 is restricted to the faster half of the disk, by
> design.
>
> It is different when writing. there both halves, fast and slow, are
> used.

As I mentioned earlier, I was having a hard time visualizing the data
layout.  So here's a simple diagram that shows near/far layout and why
Keld was right - with a far layout, reads can be isolated to the fast
half of the disk.

It also shows how sequential writes (or any other write that spans
multiple chunks) force the drives to seek half way across the disk for
each write.

Near layout, 4 disks, 2 copies:
a b c d
0 0 1 1
2 2 3 3
4 4 5 5
6 6 7 7

Far layout, 4 disks, 2 copies
a b c d
0 1 2 3
4 5 6 7
7 0 1 2
3 4 5 6

>> >> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
>> >> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
>> >> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
>> >> > for the 3132 model. Have you tried it out in practice?
>> >>
>> >> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
>> >> ran dd read/write tests on individual disks and found that overall
>> >> throughput peaked exactly at 120MB/s.
>> >
>> > Hmm, get another controller, then. A cheap PCIe contoller should be able
>> > to do about 300 MB/s on a x1 PCIe.
>>
>> Please read my reference again.  It's a motherboard limitation.  I
>> already _have_ a good, cheap PCIe controller.
>
> OK, I read:
> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> as being the description of the PCIe controller, especially SIL 3132 -
> the PCIe controller. And that this was restricted to 120 MB/s - not the
> mobo. Anuway, yuo could get a new mobo, they are  cheap these days and
> many of them come with either 4 or 8 SATA interfaces. If you have bought
> Velociraptors then it must be for the speed, and quite cheap mobos could
> enhance your performance considerably.

Hah, I wish I had Velociraptors - I was only using those as an example
since I happened to have the IO rate charts handy. :-)  And also as I
mentioned, streaming IO is rare on this server - current throughput of
the setup is more than enough for the workload, especially considering
how little we spent on building the array.

I built my particular system on a severe budget using 8 spare 80GB
drives and a $165 5-bay external enclosure since the chassis doesn't
have room for more than 4 drives.  Spending a minimum of $600-750 on a
bare-bones motherboard/CPU/memory upgrade is not in the budget at this
time. The existing server is an old dual Xeon 3.2 GHz system with 8GB
RAM, and I would want to upgrade if I'm going to spend any more and
get something at least twice as fast meaning quad-core 3GHz+, 12-16GB
RAM, etc...

> yes, it seems we have different usage scenarios. I am serving reasonably
> big files, say 700 MB ISO images, or .rpm packages of several MBs, you are
> probably doing some database access.

Yes, completely different.  You are working with mostly sequential
reads/writes on large files.

>> Here's a benchmark which tests SSDs and rotational disks.  All the
>> rotational disks are getting less than 1MB/s in the random IO test.
>> http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=25  It's a
>> worst case scenario, but not far from my workloads which obviously
>> read a bit more data on each read.
>
> What are your average read or write block sizes? Is it some database
> usage?

Typical writes are very small - a few kB - database and application logs.

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-04 22:21                 ` David Rees
@ 2009-08-04 23:18                   ` John Robinson
  2009-08-04 23:42                     ` David Rees
  2009-08-05  8:08                     ` Keld Jørn Simonsen
  2009-08-05  7:44                   ` Keld Jørn Simonsen
  1 sibling, 2 replies; 20+ messages in thread
From: John Robinson @ 2009-08-04 23:18 UTC (permalink / raw)
  To: David Rees; +Cc: Keld Jørn Simonsen, linux-raid

On 04/08/2009 23:21, David Rees wrote:
[...]
> As I mentioned earlier, I was having a hard time visualizing the data
> layout.  So here's a simple diagram that shows near/far layout and why
> Keld was right - with a far layout, reads can be isolated to the fast
> half of the disk.
[...]
> Near layout, 4 disks, 2 copies:
> a b c d
> 0 0 1 1
> 2 2 3 3
> 4 4 5 5
> 6 6 7 7
> 
> Far layout, 4 disks, 2 copies
> a b c d
> 0 1 2 3
> 4 5 6 7
> 7 0 1 2
> 3 4 5 6

But I don't think I'd want reads isolated to the first half of the disc. 
If I wanted a block read, and the drive which has its near copy is 
already busy, but the drive with the far copy is idle, I'd probably 
rather the read came from the far copy, than wait for the drive with the 
near copy to come free.

For example, say I want block 0, and there's a write pending for block 
3. I want block 0 from drive b now, not drive a later.

Or will I actually get more IOPS by waiting, if I'm doing a lot of small 
reads and writes?

Cheers,

John.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-04 23:18                   ` John Robinson
@ 2009-08-04 23:42                     ` David Rees
  2009-08-05  8:20                       ` Keld Jørn Simonsen
  2009-08-05  8:08                     ` Keld Jørn Simonsen
  1 sibling, 1 reply; 20+ messages in thread
From: David Rees @ 2009-08-04 23:42 UTC (permalink / raw)
  To: John Robinson; +Cc: Keld Jørn Simonsen, linux-raid

On Tue, Aug 4, 2009 at 4:18 PM, John
Robinson<john.robinson@anonymous.org.uk> wrote:
> On 04/08/2009 23:21, David Rees wrote:
>> Near layout, 4 disks, 2 copies:
>> a b c d
>> 0 0 1 1
>> 2 2 3 3
>> 4 4 5 5
>> 6 6 7 7
>>
>> Far layout, 4 disks, 2 copies
>> a b c d
>> 0 1 2 3
>> 4 5 6 7
>> 7 0 1 2
>> 3 4 5 6
>
> But I don't think I'd want reads isolated to the first half of the disc. If
> I wanted a block read, and the drive which has its near copy is already
> busy, but the drive with the far copy is idle, I'd probably rather the read
> came from the far copy, than wait for the drive with the near copy to come
> free.
>
> For example, say I want block 0, and there's a write pending for block 3. I
> want block 0 from drive b now, not drive a later.

In that case, it seems that it should be fastest to retrieve it from
the idle disk b, even if it has to go to the slower half of the disk.
After all, any write will take at least as long as the read will.

> Or will I actually get more IOPS by waiting, if I'm doing a lot of small
> reads and writes?

I doubt it - if you're doing a lot of small reads and writes, during
the writes the heads will be frequently seeking to the slow half of
the disk, anyway.

Of course, benchmarks would tell for sure.  ;-)

-Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-04 22:21                 ` David Rees
  2009-08-04 23:18                   ` John Robinson
@ 2009-08-05  7:44                   ` Keld Jørn Simonsen
  2009-08-05  8:18                     ` NeilBrown
  1 sibling, 1 reply; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-08-05  7:44 UTC (permalink / raw)
  To: David Rees; +Cc: linux-raid

On Tue, Aug 04, 2009 at 03:21:28PM -0700, David Rees wrote:
> 2009/8/1 Keld Jørn Simonsen <keld@dkuug.dk>:
> > On Sat, Aug 01, 2009 at 08:13:45AM -0700, David Rees wrote:
> >> No - you're getting 120 MB/s from one disk and 80MB/s from another.
> >> How that would add up to 230MB/s defies logic...
> >
> > Why only 80 MB/ when reading? reading from both disks with raid10,f2 are done at the
> > beginning of both disks, thus getting about 115 MB/s from both of them.
> >
> > reading in raid10,f2 is restricted to the faster half of the disk, by
> > design.
> >
> > It is different when writing. there both halves, fast and slow, are
> > used.
> 
> As I mentioned earlier, I was having a hard time visualizing the data
> layout.  So here's a simple diagram that shows near/far layout and why
> Keld was right - with a far layout, reads can be isolated to the fast
> half of the disk.
> 
> It also shows how sequential writes (or any other write that spans
> multiple chunks) force the drives to seek half way across the disk for
> each write.
> 
> Near layout, 4 disks, 2 copies:
> a b c d
> 0 0 1 1
> 2 2 3 3
> 4 4 5 5
> 6 6 7 7
> 
> Far layout, 4 disks, 2 copies
> a b c d
> 0 1 2 3
> 4 5 6 7
> 7 0 1 2
> 3 4 5 6

No, it is rather:

Far layout, 4 disks, 2 copies
a b c d
0 1 2 3
4 5 6 7
1 0 3 2
5 4 7 6

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-04 23:18                   ` John Robinson
  2009-08-04 23:42                     ` David Rees
@ 2009-08-05  8:08                     ` Keld Jørn Simonsen
  1 sibling, 0 replies; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-08-05  8:08 UTC (permalink / raw)
  To: John Robinson; +Cc: David Rees, linux-raid

On Wed, Aug 05, 2009 at 12:18:06AM +0100, John Robinson wrote:
> On 04/08/2009 23:21, David Rees wrote:
> [...]
>> As I mentioned earlier, I was having a hard time visualizing the data
>> layout.  So here's a simple diagram that shows near/far layout and why
>> Keld was right - with a far layout, reads can be isolated to the fast
>> half of the disk.
> [...]
>> Near layout, 4 disks, 2 copies:
>> a b c d
>> 0 0 1 1
>> 2 2 3 3
>> 4 4 5 5
>> 6 6 7 7
>>
>> Far layout, 4 disks, 2 copies
>> a b c d
>> 0 1 2 3
>> 4 5 6 7
>> 7 0 1 2
>> 3 4 5 6
>
> But I don't think I'd want reads isolated to the first half of the disc.  
> If I wanted a block read, and the drive which has its near copy is  
> already busy, but the drive with the far copy is idle, I'd probably  
> rather the read came from the far copy, than wait for the drive with the  
> near copy to come free.
>
> For example, say I want block 0, and there's a write pending for block  
> 3. I want block 0 from drive b now, not drive a later.

That is not how IO works for disks. There is a queue for each physical
disk, and your reads are put into that queue. An elevator algorithm then
orders the requests in the queue, normally by taking the requests in the
sequence of their sector numbers. In this way the head movement is
minimized and thus speeding up total transfers by a factor of maybe 4.

So there is not something about getting IO "now", you are always in a
queue, at least on a system with some load. Systems without loads are
probably not interesting, then you get your data immediately.

> Or will I actually get more IOPS by waiting, if I'm doing a lot of small  
> reads and writes?

Yes, you get many times more IOPS by having an elavator algorithm in
place, also with small reads and writes. Confining the reads to only to
the faster half gets you even more speed improvements, also for IOPS. 
Maybe not that much, like  a little more than half of the total head
movement from start to end per queue run of the elevator, maybe about 5 %

best regards
keld

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-05  7:44                   ` Keld Jørn Simonsen
@ 2009-08-05  8:18                     ` NeilBrown
  0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2009-08-05  8:18 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: David Rees, linux-raid

On Wed, August 5, 2009 5:44 pm, Keld Jørn Simonsen wrote:
> On Tue, Aug 04, 2009 at 03:21:28PM -0700, David Rees wrote:

>> Far layout, 4 disks, 2 copies
>> a b c d
>> 0 1 2 3
>> 4 5 6 7
>> 7 0 1 2
>> 3 4 5 6
>
> No, it is rather:
>
> Far layout, 4 disks, 2 copies
> a b c d
> 0 1 2 3
> 4 5 6 7
> 1 0 3 2
> 5 4 7 6

Actually it is

a b c d
0 1 2 3
4 5 6 7
3 0 1 2
7 4 5 6

NeilBrown

>
> Best regards
> keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: raind-1 resync speed slow down to 50% by the time it finishes
  2009-08-04 23:42                     ` David Rees
@ 2009-08-05  8:20                       ` Keld Jørn Simonsen
  0 siblings, 0 replies; 20+ messages in thread
From: Keld Jørn Simonsen @ 2009-08-05  8:20 UTC (permalink / raw)
  To: David Rees; +Cc: John Robinson, linux-raid

On Tue, Aug 04, 2009 at 04:42:10PM -0700, David Rees wrote:
> On Tue, Aug 4, 2009 at 4:18 PM, John
> Robinson<john.robinson@anonymous.org.uk> wrote:
> > On 04/08/2009 23:21, David Rees wrote:
> >> Near layout, 4 disks, 2 copies:
> >> a b c d
> >> 0 0 1 1
> >> 2 2 3 3
> >> 4 4 5 5
> >> 6 6 7 7
> >>
> >> Far layout, 4 disks, 2 copies
> >> a b c d
> >> 0 1 2 3
> >> 4 5 6 7
> >> 7 0 1 2
> >> 3 4 5 6
> >
> > But I don't think I'd want reads isolated to the first half of the disc. If
> > I wanted a block read, and the drive which has its near copy is already
> > busy, but the drive with the far copy is idle, I'd probably rather the read
> > came from the far copy, than wait for the drive with the near copy to come
> > free.
> >
> > For example, say I want block 0, and there's a write pending for block 3. I
> > want block 0 from drive b now, not drive a later.
> 
> In that case, it seems that it should be fastest to retrieve it from
> the idle disk b, even if it has to go to the slower half of the disk.
> After all, any write will take at least as long as the read will.

You seem to not appreciate the elevator queue as I explained it in a
previous message. Disks are not idle, they are always servicing their
elevator queue.

> > Or will I actually get more IOPS by waiting, if I'm doing a lot of small
> > reads and writes?
> 
> I doubt it - if you're doing a lot of small reads and writes, during
> the writes the heads will be frequently seeking to the slow half of
> the disk, anyway.

Writes are only done when the caches are to be flushed, like every 30
secs or 5 secs or so.  And the writes will be ordered by the elevator
algorithm to minimize the seeks, very considerably. Given that the IO is
random, in theory there should not be any difference on the ordered
random queues of raid10,f2 and raid10,n2.

> Of course, benchmarks would tell for sure.  ;-)

Sure!

Best regards
keld

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-08-05  8:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-30  6:25 raind-1 resync speed slow down to 50% by the time it finishes Tirumala Reddy Marri
2009-07-30  7:35 ` Robin Hill
2009-07-30 10:18   ` Keld Jørn Simonsen
2009-07-30 20:11     ` David Rees
2009-07-31 17:54       ` Keld Jørn Simonsen
2009-07-31 18:10         ` Keld Jørn Simonsen
2009-07-31 20:10         ` David Rees
2009-08-01 13:00           ` Keld Jørn Simonsen
2009-08-01 15:13             ` David Rees
2009-08-01 17:57               ` Keld Jørn Simonsen
2009-08-04 22:21                 ` David Rees
2009-08-04 23:18                   ` John Robinson
2009-08-04 23:42                     ` David Rees
2009-08-05  8:20                       ` Keld Jørn Simonsen
2009-08-05  8:08                     ` Keld Jørn Simonsen
2009-08-05  7:44                   ` Keld Jørn Simonsen
2009-08-05  8:18                     ` NeilBrown
2009-07-30  8:44 ` Mikael Abrahamsson
2009-07-30 18:35 ` Tracy Reed
2009-07-30 20:28   ` David Rees

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).