From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 0/4 v2] optimize raid1 read balance for SSD
Date: Thu, 28 Jun 2012 13:40:04 +1000 [thread overview]
Message-ID: <20120628134004.729fb77f@notabene.brown> (raw)
In-Reply-To: <20120628032901.GA21233@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 5302 bytes --]
On Thu, 28 Jun 2012 11:29:01 +0800 Shaohua Li <shli@kernel.org> wrote:
> On Thu, Jun 28, 2012 at 11:06:16AM +1000, NeilBrown wrote:
> > On Wed, 13 Jun 2012 17:09:22 +0800 Shaohua Li <shli@kernel.org> wrote:
> >
> > > raid1 read balance is an important algorithm to make read performance optimal.
> > > It's a distance based algorithm, eg, for each request dispatch, choose disk
> > > whose last finished request is close the request. This is great for hard disk.
> > > But SSD has some special characteristics:
> > >
> > > 1. nonrotational. Distance means nothing for SSD, though merging small rquests
> > > to big request is still optimal for SSD. If no merge, distributing rquests to
> > > raid disks as more as possible is more optimal.
> >
> > This, of course, has nothing to do with the devices rotating, and everything
> > to do with there being a seek-penalty. So why the block-device flag is
> > called "rotational" I really don't know :-(
> >
> > Can we make the flags that md uses be "seek_penalty" or "no_seek" or
> > something, even if the block layer has less meaningful names?
>
> purely borrowed from block layer. I'm ok to change it.
>
> > >
> > > 2. Getting too big request isn't always optimal. For hard disk, compared to
> > > spindle move, data transfer overhead is trival, so we always prefer bigger
> > > request. In SSD, request size exceeds specific value, performance isn't always
> > > increased with request size increased. An example is readahead. If readahead
> > > merges too big request and causes some disks idle, the performance is less
> > > optimal than that when all disks are busy and running small requests.
> >
> > I note that the patch doesn't actually split requests, rather it avoids
> > allocating adjacent requests to the same device - is that right?
> > That seems sensible but doesn't seem to match your description so I wanted to
> > check.
>
> Yes, it doesn't do split, sorry for the misleading. I used to think about
> split, but gave up later, but forgot to change the description. There are two
> types of IO driving big request size: buffered read (readahead) buffered write
> and direct read or write I assume buffered write and direct read/write can
> drive high iodepth, so doing split is useless. buffered read split makes sense,
> which really should be 'prevent merging to big request'
>
> > >
> > > The patches try to address the issues. The first two patches are clean up. The
> > > third patch addresses the first item above. The forth addresses the second item
> > > above. The idea can be applied to raid10 too, which is in my todo list.
> >
> > Thanks for these - there is a lot to like here.
> >
> > One concern that I have has that they assume that all devices are the same.
> > i.e. they are all "rotational", or none of them are.
> > they all have the same optimal IO size.
> >
> > md aims to work well on heterogeneous arrays so I'll like to make it more
> > general if I can.
> > Whether to "split" adjacent requests or not is decided in the context of a
> > single device (the device that the first request is queued for) so that
> > should be able to use the optimal IO size of that devices.
> >
> > General balancing is a little harder as the decision is made in the context
> > of all active devices. In particular we need to know how to choose between a
> > seek-penalty device and a no-seek-penalty device, if they both have requests
> > queued to them and the seek-penalty device is a long way from the target.
> >
> > Maybe:
> > - if the last request to some device is within optimal-io-size of this
> > requests, then send this request to that device.
> > - if either of two possible devices has no seek penalty, choose the one with
> > the fewest outstanding requests.
> > - if both of two possible devices have a seek-penalty, then choose the
> > closest
> >
> > I think this will do the same as the current code for 'rotational' devices,
> > and will be close to what your code does for 'non-rotational' devices.
>
> This is close to what I did except for the case of one hard disk and one SSD.
> Talking about heterogeneous arrary, I assume people only do it with two
> different hard disks or two different ssd. Will people really mix hard disk and
> SSD? A hard disk can only drive 600 IOPS while a lowend SATA SSD can drive 20k
> ~ 40k IOPS. Plusing 600 to 20k doesn't significantly change IOPS.
I'm fairly sure people will try it, even if it doesn't make sense. :-)
It possibly would make sense if the slower device was marked "write-mostly",
but that disables read-balance, so what we do with read balance might be
irrelevant.
I don't want to assume devices have different speeds unless explicitly told
so, and the "rotational" flag doesn't clearly imply anything about the speed
of the device, only about seek latency. So e.g. I wouldn't want to always
assume that a non-rotational device were faster than a rotational device,
even though that may be the case in lots of situations.
I'd be less concerned about heterogeneous arrays with RAID10 as it makes less
sense there. But I certainly want to at least acknowledge the possibility
for RAID1.
thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-06-28 3:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-13 9:09 [patch 0/4 v2] optimize raid1 read balance for SSD Shaohua Li
2012-06-13 9:09 ` [patch 1/4 v2] raid1: move distance based read balance to a separate function Shaohua Li
2012-06-13 9:09 ` [patch 2/4 v2] raid1: make sequential read detection per disk based Shaohua Li
2012-06-13 9:09 ` [patch 3/4 v2] raid1: read balance chooses idlest disk Shaohua Li
2012-06-13 9:09 ` [patch 4/4 v2] raid1: split large request for SSD Shaohua Li
2012-06-28 1:06 ` [patch 0/4 v2] optimize raid1 read balance " NeilBrown
2012-06-28 1:47 ` Roberto Spadim
2012-06-28 3:29 ` Shaohua Li
2012-06-28 3:40 ` NeilBrown [this message]
2012-06-28 6:31 ` David Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120628134004.729fb77f@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).