Re: Propose of enhancement of raid1 driver

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Doug Ledford <dledford@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: Miroslaw Mieszczak <mirek@mieszczak.com.pl>, linux-raid@vger.kernel.org
Subject: Re: Propose of enhancement of raid1 driver
Date: Thu, 19 Oct 2006 12:06:35 -0400	[thread overview]
Message-ID: <1161273995.2917.588.camel@fc6.xsintricity.com> (raw)
In-Reply-To: <17718.61672.794099.425723@cse.unsw.edu.au>

[-- Attachment #1: Type: text/plain, Size: 3769 bytes --]

On Thu, 2006-10-19 at 13:28 +1000, Neil Brown wrote:
> On Tuesday October 17, mirek@mieszczak.com.pl wrote:
> > I would like to propose an enhancement of raid 1 driver in linux kernel.
> > The enhancement would be speedup of data reading on mirrored partitions.
> > The idea is easy.
> > If we have mirrored partition over 2 disks, and these disk are in sync, there is
> > possibility of simultaneous reading of the data from both disks on the same way
> > as in raid 0. So it would be chunk1 read from master, chunk2 read from slave at
> > the same time. 
> > As result it would give significant speedup of read operation (comparable with
> > speed of raid 0 disks).
> 
> This is not as easy as it sounds.
> Skipping over blocks within a track is no faster than reading blocks
> in the track, so you would need to make sure that your chunk size is
> larger than one track - probably it would need to be several tracks.
> 
> Raid1 already does some read-balancing, though it is possible (even
> likely) that it doesn't balance very effectively.  Working out how
> best to do the balancing in general in a non-trivial task, but would
> be worth spending time on.
> 
> The raid10 module in linux supports a layout described as 'far=2'.
> In this layout, with two drives, the first half of the drives is used
> for a raid0, and the second half is used for a mirrored raid0 with the
> data on the other disk.
> In this layout reads should certainly go at raid0 speeds, though
> there is cost in the speed of writes.
> 
> Maybe you would like to experiment.  Write a program that reads from
> two drives in parallel, reading all the 'odd' chunks from one drive
> and the 'even' chunks from the other, and find out how fast it is.
> Maybe you could get it to try lots of different chunk sizes and see
> which is the fastest.

Too artificial.  The results of this sort of test would not translate
well to real world usage.

> That might be quite helpful in understanding how to get read-balancing
> working well.

Doing *good* read balancing is hard, especially given things like FC
attached storage, iSCSI/iSER, etc.  If I wanted to do this right, I'd
start by teaching the md code to look more deeply into block devices,
possibly even with a self tuning series of reads at startup to test
things like close seek sequential operation times versus maximum seek
throughput which would clue you in as to whether the device you are
talking to might have more than 1 physical spindle which would impact
the cost you associate to seek requiring operations relative to
bandwidth heavy operations, I might even go so far as to look into the
SCSI transport classes for clues about data throughput at bus bandwidth
versus command startup/teardown costs on the bus so you have an accurate
idea if lots of outstanding small commands are likely to cause your
device to suffer bus starvation issues from overhead.  Then I'd use that
data to help me numerically quantify the load on a device, updated both
when a command is added to the block layer queue (the queued load) and
when the command is actually removed from the block queue and sent to
the device (the active load) and updated again when the command is
received back.  Then, I'd basically look at what an incoming command
*would* do to each constituent disk's load values to see whether it
should go to one or the other.  But, that's just off the top of my head
and I may be on crack...I didn't check what my wife handed me this
morning.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

next prev parent reply	other threads:[~2006-10-19 16:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-17  6:51 Propose of enhancement of raid1 driver Miroslaw Mieszczak
2006-10-19  3:28 ` Neil Brown
2006-10-19 16:06   ` Doug Ledford [this message]
2006-10-20  4:59   ` Jeff Breidenbach
2006-10-21 16:01   ` Tomasz Chmielewski
2006-10-23  1:33     ` Neil Brown
2006-10-23 15:28   ` Mario 'BitKoenig' Holbe
2006-10-28 17:31     ` Al Boldi
2006-10-28 20:30       ` Mario 'BitKoenig' Holbe
2006-10-30 15:44         ` Al Boldi
2006-10-30 17:02           ` Mario 'BitKoenig' Holbe
2006-10-30 17:55             ` Al Boldi
2006-10-30 17:59               ` Jeff Breidenbach
2006-10-30 20:50               ` Mario 'BitKoenig' Holbe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1161273995.2917.588.camel@fc6.xsintricity.com \
    --to=dledford@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mirek@mieszczak.com.pl \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).