* Propose of enhancement of raid1 driver
@ 2006-10-17 6:51 Miroslaw Mieszczak
2006-10-19 3:28 ` Neil Brown
0 siblings, 1 reply; 14+ messages in thread
From: Miroslaw Mieszczak @ 2006-10-17 6:51 UTC (permalink / raw)
To: linux-raid
I would like to propose an enhancement of raid 1 driver in linux kernel.
The enhancement would be speedup of data reading on mirrored partitions.
The idea is easy.
If we have mirrored partition over 2 disks, and these disk are in sync, there is
possibility of simultaneous reading of the data from both disks on the same way
as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
the same time.
As result it would give significant speedup of read operation (comparable with
speed of raid 0 disks).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-17 6:51 Propose of enhancement of raid1 driver Miroslaw Mieszczak
@ 2006-10-19 3:28 ` Neil Brown
2006-10-19 16:06 ` Doug Ledford
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Neil Brown @ 2006-10-19 3:28 UTC (permalink / raw)
To: Miroslaw Mieszczak; +Cc: linux-raid
On Tuesday October 17, mirek@mieszczak.com.pl wrote:
> I would like to propose an enhancement of raid 1 driver in linux kernel.
> The enhancement would be speedup of data reading on mirrored partitions.
> The idea is easy.
> If we have mirrored partition over 2 disks, and these disk are in sync, there is
> possibility of simultaneous reading of the data from both disks on the same way
> as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
> the same time.
> As result it would give significant speedup of read operation (comparable with
> speed of raid 0 disks).
This is not as easy as it sounds.
Skipping over blocks within a track is no faster than reading blocks
in the track, so you would need to make sure that your chunk size is
larger than one track - probably it would need to be several tracks.
Raid1 already does some read-balancing, though it is possible (even
likely) that it doesn't balance very effectively. Working out how
best to do the balancing in general in a non-trivial task, but would
be worth spending time on.
The raid10 module in linux supports a layout described as 'far=2'.
In this layout, with two drives, the first half of the drives is used
for a raid0, and the second half is used for a mirrored raid0 with the
data on the other disk.
In this layout reads should certainly go at raid0 speeds, though
there is cost in the speed of writes.
Maybe you would like to experiment. Write a program that reads from
two drives in parallel, reading all the 'odd' chunks from one drive
and the 'even' chunks from the other, and find out how fast it is.
Maybe you could get it to try lots of different chunk sizes and see
which is the fastest.
That might be quite helpful in understanding how to get read-balancing
working well.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-19 3:28 ` Neil Brown
@ 2006-10-19 16:06 ` Doug Ledford
2006-10-20 4:59 ` Jeff Breidenbach
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: Doug Ledford @ 2006-10-19 16:06 UTC (permalink / raw)
To: Neil Brown; +Cc: Miroslaw Mieszczak, linux-raid
[-- Attachment #1: Type: text/plain, Size: 3769 bytes --]
On Thu, 2006-10-19 at 13:28 +1000, Neil Brown wrote:
> On Tuesday October 17, mirek@mieszczak.com.pl wrote:
> > I would like to propose an enhancement of raid 1 driver in linux kernel.
> > The enhancement would be speedup of data reading on mirrored partitions.
> > The idea is easy.
> > If we have mirrored partition over 2 disks, and these disk are in sync, there is
> > possibility of simultaneous reading of the data from both disks on the same way
> > as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
> > the same time.
> > As result it would give significant speedup of read operation (comparable with
> > speed of raid 0 disks).
>
> This is not as easy as it sounds.
> Skipping over blocks within a track is no faster than reading blocks
> in the track, so you would need to make sure that your chunk size is
> larger than one track - probably it would need to be several tracks.
>
> Raid1 already does some read-balancing, though it is possible (even
> likely) that it doesn't balance very effectively. Working out how
> best to do the balancing in general in a non-trivial task, but would
> be worth spending time on.
>
> The raid10 module in linux supports a layout described as 'far=2'.
> In this layout, with two drives, the first half of the drives is used
> for a raid0, and the second half is used for a mirrored raid0 with the
> data on the other disk.
> In this layout reads should certainly go at raid0 speeds, though
> there is cost in the speed of writes.
>
> Maybe you would like to experiment. Write a program that reads from
> two drives in parallel, reading all the 'odd' chunks from one drive
> and the 'even' chunks from the other, and find out how fast it is.
> Maybe you could get it to try lots of different chunk sizes and see
> which is the fastest.
Too artificial. The results of this sort of test would not translate
well to real world usage.
> That might be quite helpful in understanding how to get read-balancing
> working well.
Doing *good* read balancing is hard, especially given things like FC
attached storage, iSCSI/iSER, etc. If I wanted to do this right, I'd
start by teaching the md code to look more deeply into block devices,
possibly even with a self tuning series of reads at startup to test
things like close seek sequential operation times versus maximum seek
throughput which would clue you in as to whether the device you are
talking to might have more than 1 physical spindle which would impact
the cost you associate to seek requiring operations relative to
bandwidth heavy operations, I might even go so far as to look into the
SCSI transport classes for clues about data throughput at bus bandwidth
versus command startup/teardown costs on the bus so you have an accurate
idea if lots of outstanding small commands are likely to cause your
device to suffer bus starvation issues from overhead. Then I'd use that
data to help me numerically quantify the load on a device, updated both
when a command is added to the block layer queue (the queued load) and
when the command is actually removed from the block queue and sent to
the device (the active load) and updated again when the command is
received back. Then, I'd basically look at what an incoming command
*would* do to each constituent disk's load values to see whether it
should go to one or the other. But, that's just off the top of my head
and I may be on crack...I didn't check what my wife handed me this
morning.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-19 3:28 ` Neil Brown
2006-10-19 16:06 ` Doug Ledford
@ 2006-10-20 4:59 ` Jeff Breidenbach
2006-10-21 16:01 ` Tomasz Chmielewski
2006-10-23 15:28 ` Mario 'BitKoenig' Holbe
3 siblings, 0 replies; 14+ messages in thread
From: Jeff Breidenbach @ 2006-10-20 4:59 UTC (permalink / raw)
To: Neil Brown; +Cc: Miroslaw Mieszczak, linux-raid
> The raid10 module in linux supports a layout described as 'far=2'.
> In this layout, with two drives, the first half of the drives is used
> for a raid0, and the second half is used for a mirrored raid0 with the
> data on the other disk. In this layout reads should certainly go at
> raid0 speeds, though there is cost in the speed of writes.
Remember, RAID-10 far=2 mode was very slow in July 2005; don't
know if it has improved since then. See:
http://www.mail-archive.com/linux-raid@vger.kernel.org/msg02339.html
PS. If you ever need it I have plenty of data from a super redundant
5 disk RAID-1 recorded with atsar. It does quite well balancing
many small, parallel random reads from a webserver. Struggles
a little when mixing a single large read with many small parallel
reads.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-19 3:28 ` Neil Brown
2006-10-19 16:06 ` Doug Ledford
2006-10-20 4:59 ` Jeff Breidenbach
@ 2006-10-21 16:01 ` Tomasz Chmielewski
2006-10-23 1:33 ` Neil Brown
2006-10-23 15:28 ` Mario 'BitKoenig' Holbe
3 siblings, 1 reply; 14+ messages in thread
From: Tomasz Chmielewski @ 2006-10-21 16:01 UTC (permalink / raw)
To: Neil Brown; +Cc: Miroslaw Mieszczak, linux-raid
Neil Brown wrote:
> On Tuesday October 17, mirek@mieszczak.com.pl wrote:
>> I would like to propose an enhancement of raid 1 driver in linux kernel.
>> The enhancement would be speedup of data reading on mirrored partitions.
>> The idea is easy.
>> If we have mirrored partition over 2 disks, and these disk are in sync, there is
>> possibility of simultaneous reading of the data from both disks on the same way
>> as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
>> the same time.
>> As result it would give significant speedup of read operation (comparable with
>> speed of raid 0 disks).
>
> This is not as easy as it sounds.
> Skipping over blocks within a track is no faster than reading blocks
> in the track, so you would need to make sure that your chunk size is
> larger than one track - probably it would need to be several tracks.
What you said is certainly true when we read one file at a given moment.
What if we read two different files at a given time? Certainly, it would
be faster if DRIVE_1 reads FILE_1, and DRIVE_2 reads FILE_2.
> Raid1 already does some read-balancing, though it is possible (even
> likely) that it doesn't balance very effectively. Working out how
> best to do the balancing in general in a non-trivial task, but would
> be worth spending time on.
Probably what I said before isn't very correct, as RAID-1 has no idea of
the filesystem that is on top of it; rather, it will see attempts to
access differend areas of the array?
--
Tomasz Chmielewski
http://wpkg.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-21 16:01 ` Tomasz Chmielewski
@ 2006-10-23 1:33 ` Neil Brown
0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-10-23 1:33 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: Miroslaw Mieszczak, linux-raid
On Saturday October 21, mangoo@wpkg.org wrote:
> Neil Brown wrote:
> > On Tuesday October 17, mirek@mieszczak.com.pl wrote:
> >> I would like to propose an enhancement of raid 1 driver in linux kernel.
> >> The enhancement would be speedup of data reading on mirrored partitions.
> >> The idea is easy.
> >> If we have mirrored partition over 2 disks, and these disk are in sync, there is
> >> possibility of simultaneous reading of the data from both disks on the same way
> >> as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
> >> the same time.
> >> As result it would give significant speedup of read operation (comparable with
> >> speed of raid 0 disks).
> >
> > This is not as easy as it sounds.
> > Skipping over blocks within a track is no faster than reading blocks
> > in the track, so you would need to make sure that your chunk size is
> > larger than one track - probably it would need to be several tracks.
>
> What you said is certainly true when we read one file at a given moment.
>
> What if we read two different files at a given time? Certainly, it would
> be faster if DRIVE_1 reads FILE_1, and DRIVE_2 reads FILE_2.
And it is possible that md/raid1 currently does that (I think it
depends a bit on the size and location of the files).
But what if you are reading 3 separate areas of the filesystem...
Sharing that over 2 drives when you don't know in-advance what will be
done is probably non-trivial :-)
>
>
> > Raid1 already does some read-balancing, though it is possible (even
> > likely) that it doesn't balance very effectively. Working out how
> > best to do the balancing in general in a non-trivial task, but would
> > be worth spending time on.
>
> Probably what I said before isn't very correct, as RAID-1 has no idea of
> the filesystem that is on top of it; rather, it will see attempts to
> access differend areas of the array?
>
Exactly. It just sees block address. It tends to prefer to send a
request to a drive that has recently been asked to read a block nearby
the new request.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-19 3:28 ` Neil Brown
` (2 preceding siblings ...)
2006-10-21 16:01 ` Tomasz Chmielewski
@ 2006-10-23 15:28 ` Mario 'BitKoenig' Holbe
2006-10-28 17:31 ` Al Boldi
3 siblings, 1 reply; 14+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2006-10-23 15:28 UTC (permalink / raw)
To: linux-raid
Neil Brown <neilb@suse.de> wrote:
> Skipping over blocks within a track is no faster than reading blocks
> in the track, so you would need to make sure that your chunk size is
Not even "no faster" but probably even "slower".
For seek()+read() from userspace this gets smoothed by the kernel's
implicit read-ahead but for md this would most likely be different.
It gets IMHO further smoothed by the drive's internal cache.
Of course, there is a break-even point where seek()+read() gets faster
than sequential read() but this is surely environment-dependant.
A student of mine did some tests (in another context and from userspace,
though) to find the break-even point and found seek()+read() and
sequential read() to be quite equal up to 512k and significantly
different only at 1M and more (in his testing environment, of course).
So, to utilize this one would surely need some adaptive strategy which
measures the break-even point and somehow also identifies "large
sequential read()s" to minimize useless data transfers (and thus cache
poisoning at the different caching stages).
regards
Mario
--
But after a while I learned the trick of speaking fast. You don't have
to think any faster; just use twice as many words to say everything.
-- Paul Graham
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-23 15:28 ` Mario 'BitKoenig' Holbe
@ 2006-10-28 17:31 ` Al Boldi
2006-10-28 20:30 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 14+ messages in thread
From: Al Boldi @ 2006-10-28 17:31 UTC (permalink / raw)
To: linux-raid
Mario 'BitKoenig' Holbe wrote:
> Neil Brown <neilb@suse.de> wrote:
> > Skipping over blocks within a track is no faster than reading blocks
> > in the track, so you would need to make sure that your chunk size is
>
> Not even "no faster" but probably even "slower".
Surely slower, on conventional hds anyway.
But what still isn't clear, why can't raid1 use something like the raid10
offset=2 mode?
Thanks!
--
Al
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-28 17:31 ` Al Boldi
@ 2006-10-28 20:30 ` Mario 'BitKoenig' Holbe
2006-10-30 15:44 ` Al Boldi
0 siblings, 1 reply; 14+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2006-10-28 20:30 UTC (permalink / raw)
To: linux-raid
Al Boldi <a1426z@gawab.com> wrote:
> But what still isn't clear, why can't raid1 use something like the raid10
> offset=2 mode?
RAID1 has equal data on all mirrors, so sooner or later you have to seek
somewhere - no matter how you layout the data on each mirror.
regards
Mario
--
Programmieren in C++ haelt die grauen Zellen am Leben. Es schaerft
alle fuenf Sinne: den Schwachsinn, den Bloedsinn, den Wahnsinn, den
Unsinn und den Stumpfsinn.
[Holger Veit in doc]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-28 20:30 ` Mario 'BitKoenig' Holbe
@ 2006-10-30 15:44 ` Al Boldi
2006-10-30 17:02 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 14+ messages in thread
From: Al Boldi @ 2006-10-30 15:44 UTC (permalink / raw)
To: linux-raid
Mario 'BitKoenig' Holbe wrote:
> Al Boldi <a1426z@gawab.com> wrote:
> > But what still isn't clear, why can't raid1 use something like the
> > raid10 offset=2 mode?
>
> RAID1 has equal data on all mirrors, so sooner or later you have to seek
> somewhere - no matter how you layout the data on each mirror.
Don't underestimate the effects mere layout can have on multi-disk array
performance, despite it being highly hw dependent.
The best approach would probably involve a user-configurable layout table, to
tune it to the specific hw.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-30 15:44 ` Al Boldi
@ 2006-10-30 17:02 ` Mario 'BitKoenig' Holbe
2006-10-30 17:55 ` Al Boldi
0 siblings, 1 reply; 14+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2006-10-30 17:02 UTC (permalink / raw)
To: linux-raid
Al Boldi <a1426z@gawab.com> wrote:
> Don't underestimate the effects mere layout can have on multi-disk array
> performance, despite it being highly hw dependent.
I can't see the difference between equal mirrors and somehow interleaved
layout on RAID1. Since you have to seek anyways, there should be no
difference between both approaches once you read big enough chunks. The
problem with reading big chunks is: you probably read far too much when
you don't really need the data you did read. And vice versa: when you
don't read big chunks, it doesn't matter how your data is laid out.
regards
Mario
--
Whenever you design a better fool-proof software,
the genetic pool will always design a better fool.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-30 17:02 ` Mario 'BitKoenig' Holbe
@ 2006-10-30 17:55 ` Al Boldi
2006-10-30 17:59 ` Jeff Breidenbach
2006-10-30 20:50 ` Mario 'BitKoenig' Holbe
0 siblings, 2 replies; 14+ messages in thread
From: Al Boldi @ 2006-10-30 17:55 UTC (permalink / raw)
To: linux-raid
Mario 'BitKoenig' Holbe wrote:
> Al Boldi <a1426z@gawab.com> wrote:
> > Don't underestimate the effects mere layout can have on multi-disk array
> > performance, despite it being highly hw dependent.
>
> I can't see the difference between equal mirrors and somehow interleaved
> layout on RAID1. Since you have to seek anyways, there should be no
> difference between both approaches once you read big enough chunks. The
> problem with reading big chunks is: you probably read far too much when
> you don't really need the data you did read.
Think adaptive.
> And vice versa: when you
> don't read big chunks, it doesn't matter how your data is laid out.
Think tracks and heads, physical that is.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-30 17:55 ` Al Boldi
@ 2006-10-30 17:59 ` Jeff Breidenbach
2006-10-30 20:50 ` Mario 'BitKoenig' Holbe
1 sibling, 0 replies; 14+ messages in thread
From: Jeff Breidenbach @ 2006-10-30 17:59 UTC (permalink / raw)
To: Al Boldi; +Cc: linux-raid
If linux RAID-10 is still much slower than RAID-1 this discussion is kind
of moot, right?
Jeff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Propose of enhancement of raid1 driver
2006-10-30 17:55 ` Al Boldi
2006-10-30 17:59 ` Jeff Breidenbach
@ 2006-10-30 20:50 ` Mario 'BitKoenig' Holbe
1 sibling, 0 replies; 14+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2006-10-30 20:50 UTC (permalink / raw)
To: linux-raid
Al Boldi <a1426z@gawab.com> wrote:
> Think adaptive.
You didn't read the entire thread, did you?
> Think tracks and heads, physical that is.
I would like to do so. However, it's quite impossible to get the real
layout of todays disks. And even worse - it can change at runtime, think
about sector reallocation.
regards
Mario
--
I've never been certain whether the moral of the Icarus story should
only be, as is generally accepted, "Don't try to fly too high," or
whether it might also be thought of as, "Forget the wax and feathers
and do a better job on the wings." -- Stanley Kubrick
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-10-30 20:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-17 6:51 Propose of enhancement of raid1 driver Miroslaw Mieszczak
2006-10-19 3:28 ` Neil Brown
2006-10-19 16:06 ` Doug Ledford
2006-10-20 4:59 ` Jeff Breidenbach
2006-10-21 16:01 ` Tomasz Chmielewski
2006-10-23 1:33 ` Neil Brown
2006-10-23 15:28 ` Mario 'BitKoenig' Holbe
2006-10-28 17:31 ` Al Boldi
2006-10-28 20:30 ` Mario 'BitKoenig' Holbe
2006-10-30 15:44 ` Al Boldi
2006-10-30 17:02 ` Mario 'BitKoenig' Holbe
2006-10-30 17:55 ` Al Boldi
2006-10-30 17:59 ` Jeff Breidenbach
2006-10-30 20:50 ` Mario 'BitKoenig' Holbe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).