public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* raid1 performance
@ 2002-04-30 12:23 Jaime Medrano
  2002-04-30 12:38 ` Arjan van de Ven
  0 siblings, 1 reply; 9+ messages in thread
From: Jaime Medrano @ 2002-04-30 12:23 UTC (permalink / raw)
  To: linux-kernel

I have several raid arrays (level 0 and 1) in my machine and I have
noticed that raid1 is much more slower than I expected.

The arrays are made from two equal hds (/dev/hde, /dev/hdg). And some
numbers about the read performances are:

/dev/hde: 29 Mb/s
/dev/hdg: 29 Mb/s
/dev/md0: 27 Mb/s (raid1)
/dev/md1: 56 Mb/s (raid0)
/dev/md2: 27 Mb/s (raid1)

These numbers comes from hdparm -tT. I have noticed a very poor
performance when reading sequentially a large file from raid1 (I suppose
this is what hdparm does).

I have taken a look at the read balancing code at raid1.c and I have found
that when a sequential read happens no balancing is done, and so all the
reading is done from only one of the mirrors while the others are iddle.ç

I have tried to modify the balancing algorithm in order to balance also
sequential access, but I have got almost the same numbers.

I have thought that the reason may be that some layer bellow is making
reads of greater size than the chunks in which I balance, and so the same
work is being done twice; but I don't know the way to find this.

Does anybody know how this works?

Regards,
Jaime Medrano



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-04-30 12:23 raid1 performance Jaime Medrano
@ 2002-04-30 12:38 ` Arjan van de Ven
  2002-04-30 14:21   ` Kent Borg
  0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2002-04-30 12:38 UTC (permalink / raw)
  To: Jaime Medrano; +Cc: linux-kernel

Jaime Medrano wrote:
> 
> I have several raid arrays (level 0 and 1) in my machine and I have
> noticed that raid1 is much more slower than I expected.
> 
> The arrays are made from two equal hds (/dev/hde, /dev/hdg). And some
> numbers about the read performances are:
> 
> /dev/hde: 29 Mb/s
> /dev/hdg: 29 Mb/s
> /dev/md0: 27 Mb/s (raid1)
> /dev/md1: 56 Mb/s (raid0)
> /dev/md2: 27 Mb/s (raid1)
> 
> These numbers comes from hdparm -tT. I have noticed a very poor
> performance when reading sequentially a large file from raid1 (I suppose
> this is what hdparm does).
> 
> I have taken a look at the read balancing code at raid1.c and I have found
> that when a sequential read happens no balancing is done, and so all the
> reading is done from only one of the mirrors while the others are iddle.ç

Yes this is expected. Sequential reads from RAID1 with the 
current on disk format are as fast as the fastest disk.
The reason for this is simple: 

<ascii art of the on disk layout, each letter is a "block">

Disk 1:  ABCDEFGHIJK
Disk 2:  ABCDEFGHIJK

If you read block A from disk 1, to get more than the speed for just 1
disk
you would need to read block B from disk 2 *in parallel*, and so far so
good.
However then you need to read block C, and to do it in parallel you need
to
read it from Disk 1, but disk 1's diskhead was at block A -> so you get
a head seek.
or if the drive is trying to be intelligent it'll read block B into it's
own cache 
anyway and then block C after that (which is the more common case). Etc
etc.
This later case effectively means that Disk 1 will still read ALL blocks
from the platter
into the drive's cache, and of course Disk 2 will do likewise. In just
about all
cases you care about the platter transfer rate is the limiting facter
and not the 
"disk to host" rate. So both disk 1 and disk 2 are reading ALL the data
at platter speed,
which means the maximum speed at which you can get the data is at
platter speed.

Now if the disk wasn't smart and was doing seeks, it would suck much
much more due
to the high cost of seeks....

The only way to get the "1 thread sequential read" case faster is by
modifying the 
disk layout to be

Disk 1: ACEGIKBDFHJ
Disk 2: ACEGIKBDFHJ

where disk 1 again reads block A, and disk 2 reads block B.
To read block C, disk 1 doesn't have to move it's head or read a dummy
block away,
it can read block C sequention, and disk 2 can read block D that way.

That way the disks actually each only read the relevant blocks in a
sequential way
and you get (in theory) 2x the performance of 1 disk.

Greetings,
    Arjan van de Ven

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-04-30 12:38 ` Arjan van de Ven
@ 2002-04-30 14:21   ` Kent Borg
  2002-05-01 16:35     ` Jakob Østergaard
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Borg @ 2002-04-30 14:21 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Jaime Medrano, linux-kernel

On Tue, Apr 30, 2002 at 01:38:16PM +0100, Arjan van de Ven wrote, very
roughly: 
[that RAID 1 is only as fast in reading as the fastest disk because of
seeking over alternate blocks, and ]

> The only way to get the "1 thread sequential read" case faster is by
> modifying the disk layout to be
> 
> Disk 1: ACEGIKBDFHJ
> Disk 2: ACEGIKBDFHJ
> 
> where disk 1 again reads block A, and disk 2 reads block B.  To read
> block C, disk 1 doesn't have to move it's head or read a dummy block
> away, it can read block C sequention, and disk 2 can read block D
> that way.
>
> That way the disks actually each only read the relevant blocks in a
> sequential way and you get (in theory) 2x the performance of 1 disk.

I am confused.  

Assuming a big enough read is requested to allow a parallelizing to
two disks, why can't the second disk be told not to read alternate
blocks but to start reading sequential blocks starting half way up the
request?

Also, why does hdparm give me significantly faster read numbers on
/dev/md<whatever> than it does on /dev/hd<whatever>?  I had assumed
there was parallelizing going on.  Does this mean I would get a speed
improvement if I ran my single disk notebook as a single disk RAID 1
because there is some bigger or better buffering going on in that code
even without parallelizing?

Thanks,

-kb

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-04-30 14:21   ` Kent Borg
@ 2002-05-01 16:35     ` Jakob Østergaard
  2002-05-01 17:01       ` Kent Borg
  0 siblings, 1 reply; 9+ messages in thread
From: Jakob Østergaard @ 2002-05-01 16:35 UTC (permalink / raw)
  To: Kent Borg; +Cc: Arjan van de Ven, Jaime Medrano, linux-kernel

On Tue, Apr 30, 2002 at 10:21:48AM -0400, Kent Borg wrote:
> On Tue, Apr 30, 2002 at 01:38:16PM +0100, Arjan van de Ven wrote, very
> roughly: 
> [that RAID 1 is only as fast in reading as the fastest disk because of
> seeking over alternate blocks, and ]
> 
> > The only way to get the "1 thread sequential read" case faster is by
> > modifying the disk layout to be
> > 
> > Disk 1: ACEGIKBDFHJ
> > Disk 2: ACEGIKBDFHJ
> > 
> > where disk 1 again reads block A, and disk 2 reads block B.  To read
> > block C, disk 1 doesn't have to move it's head or read a dummy block
> > away, it can read block C sequention, and disk 2 can read block D
> > that way.
> >
> > That way the disks actually each only read the relevant blocks in a
> > sequential way and you get (in theory) 2x the performance of 1 disk.
> 
> I am confused.  
> 
> Assuming a big enough read is requested to allow a parallelizing to
> two disks, why can't the second disk be told not to read alternate
> blocks but to start reading sequential blocks starting half way up the
> request?

This is *not* as simple as it sounds.  Believe me, I spent a week trying...

However, with ext2 (and other filesystems as well), a large sequential file
read is *not* sequential on the disk.  You should actually see better performance
on RAID-1 than on a single disk for very large reads, becuase some of the lookups
needed (block indirection or whatever) will be run by the "best" disk in the given
situation.

> 
> Also, why does hdparm give me significantly faster read numbers on
> /dev/md<whatever> than it does on /dev/hd<whatever>?  I had assumed
> there was parallelizing going on.  Does this mean I would get a speed
> improvement if I ran my single disk notebook as a single disk RAID 1
> because there is some bigger or better buffering going on in that code
> even without parallelizing?

hdparm is not a good benchmark for this.

Use bonnie, bonnie++, tiotest, or even 'dd' with *huge* files.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-05-01 16:35     ` Jakob Østergaard
@ 2002-05-01 17:01       ` Kent Borg
  2002-05-01 17:16         ` Justin Cormack
  2002-05-01 21:23         ` Bernd Eckenfels
  0 siblings, 2 replies; 9+ messages in thread
From: Kent Borg @ 2002-05-01 17:01 UTC (permalink / raw)
  To: Jakob Østergaard, Arjan van de Ven, Jaime Medrano,
	linux-kernel

On Wed, May 01, 2002 at 06:35:53PM +0200, Jakob Østergaard wrote:
> This is *not* as simple as it sounds.  Believe me, I spent a week trying...
> 
> However, with ext2 (and other filesystems as well), a large sequential file
> read is *not* sequential on the disk.  You should actually see better performance
> on RAID-1 than on a single disk for very large reads, becuase some of the lookups
> needed (block indirection or whatever) will be run by the "best" disk in the given
> situation.

Lemme see if I am getting closer.  

When reading the disk there will be head seeks necessary.  When there
are two disks, each with its own complete copy of all the data, there
is no reason to keep the two disks' heads in the same place.  If their
heads are in different places, a read can be issued to the disk whose
heads are closer to the desired location.

This then brings up two more questions:

  1. Does the OS even know where the heads are in a modern IDE disk?

  2. Is "closer" any more finely grained than a binary
     positioned/not-positioned?

And I guess another question: How much does RAID 1 help and under what
kinds of usage?


Thanks,

-kb, the Kent who is getting smarter.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-05-01 17:01       ` Kent Borg
@ 2002-05-01 17:16         ` Justin Cormack
  2002-05-01 21:23         ` Bernd Eckenfels
  1 sibling, 0 replies; 9+ messages in thread
From: Justin Cormack @ 2002-05-01 17:16 UTC (permalink / raw)
  To: Kent Borg; +Cc: linux-kernel


> Lemme see if I am getting closer.  
> 
> When reading the disk there will be head seeks necessary.  When there
> are two disks, each with its own complete copy of all the data, there
> is no reason to keep the two disks' heads in the same place.  If their
> heads are in different places, a read can be issued to the disk whose
> heads are closer to the desired location.

yes. Look at raid1.c: the code is quite clear. Older versions didnt.

> This then brings up two more questions:
> 
>   1. Does the OS even know where the heads are in a modern IDE disk?

Not really. But there is probably a vague correspondence. Especially if
you havent remapped any bad sectors.

>   2. Is "closer" any more finely grained than a binary
>      positioned/not-positioned?

I think so. You can see different performance regions on disks (ie they
are faster on the outside for example). You could of course write a program
to test seek times from different areas and build up a real locality map.
It might not be worth it though.

> And I guess another question: How much does RAID 1 help and under what
> kinds of usage?

the latency is noticeably less in some cases, as the seeks should be smaller
on average. I have found this useful sometimes.

Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-05-01 17:01       ` Kent Borg
  2002-05-01 17:16         ` Justin Cormack
@ 2002-05-01 21:23         ` Bernd Eckenfels
  2002-05-02 16:37           ` Jakob Østergaard
  1 sibling, 1 reply; 9+ messages in thread
From: Bernd Eckenfels @ 2002-05-01 21:23 UTC (permalink / raw)
  To: linux-kernel

In article <20020501130127.A10936@borg.org> you wrote:
>  1. Does the OS even know where the heads are in a modern IDE disk?

>  2. Is "closer" any more finely grained than a binary
>     positioned/not-positioned?

> And I guess another question: How much does RAID 1 help and under what
> kinds of usage?

No, you just distribute the ready round robin, this means each disk has only
half the seeks it had before. As long as you do not spread continous blocks
(readahead) stats are good you actually reduce overall seeks. This helps
actually even if no seek is involved because of the fact that you need to
wait for the begin of a track to read it.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-05-01 21:23         ` Bernd Eckenfels
@ 2002-05-02 16:37           ` Jakob Østergaard
  2002-06-29  0:01             ` Bernd Eckenfels
  0 siblings, 1 reply; 9+ messages in thread
From: Jakob Østergaard @ 2002-05-02 16:37 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Wed, May 01, 2002 at 11:23:23PM +0200, Bernd Eckenfels wrote:
> In article <20020501130127.A10936@borg.org> you wrote:
> >  1. Does the OS even know where the heads are in a modern IDE disk?
> 
> >  2. Is "closer" any more finely grained than a binary
> >     positioned/not-positioned?
> 
> > And I guess another question: How much does RAID 1 help and under what
> > kinds of usage?
> 
> No, you just distribute the ready round robin, this means each disk has only
> half the seeks it had before. 

No, this is the way it was done a long time ago.

It turns out to be an incredibly bad idea.  In fact, it is the most CPU-efficient
way of guaranteeing the largest average seek times on your disks  ;)

The RAID-1 code now looks at which disk worked closest to the wanted position
last, and picks that disk for the seek.

> As long as you do not spread continous blocks
> (readahead) stats are good you actually reduce overall seeks. This helps
> actually even if no seek is involved because of the fact that you need to
> wait for the begin of a track to read it.

The "new" code (which is not that new anymore) will allow one disk to keep
on a single sequential read for a long time (eventually it will kick in the
idle disk(s) though).

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 performance
  2002-05-02 16:37           ` Jakob Østergaard
@ 2002-06-29  0:01             ` Bernd Eckenfels
  0 siblings, 0 replies; 9+ messages in thread
From: Bernd Eckenfels @ 2002-06-29  0:01 UTC (permalink / raw)
  To: linux-kernel

In article <20020502183758.Q31556@unthought.net> you wrote:
>> No, you just distribute the ready round robin, this means each disk has only
>> half the seeks it had before. 

> No, this is the way it was done a long time ago.

> It turns out to be an incredibly bad idea.  In fact, it is the most CPU-efficient
> way of guaranteeing the largest average seek times on your disks  ;)

> The RAID-1 code now looks at which disk worked closest to the wanted position
> last, and picks that disk for the seek.

Thats right, it is done on the distance in sector numbers. Thats a simple
compare, not sure if one could do that better.

raid1.c:raid1_read_balance()

Greetings
Bernd

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-06-28 23:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-30 12:23 raid1 performance Jaime Medrano
2002-04-30 12:38 ` Arjan van de Ven
2002-04-30 14:21   ` Kent Borg
2002-05-01 16:35     ` Jakob Østergaard
2002-05-01 17:01       ` Kent Borg
2002-05-01 17:16         ` Justin Cormack
2002-05-01 21:23         ` Bernd Eckenfels
2002-05-02 16:37           ` Jakob Østergaard
2002-06-29  0:01             ` Bernd Eckenfels

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox