Re: Questions about RAID and I/O scheduler

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Corrado Zoccolo <czoccolo@gmail.com>
To: Yuehai Xu <yuehaixu@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	yhxu@wayne.edu, neilb@suse.de, jens.axboe@oracle.com
Subject: Re: Questions about RAID and I/O scheduler
Date: Fri, 2 Apr 2010 10:02:16 +0200	[thread overview]
Message-ID: <x2h4e5e476b1004020102gaf301a69oc4e0ffda85c2ce39@mail.gmail.com> (raw)
In-Reply-To: <m2v100eff551003311324tdc2e567atd2a48e35520e8601@mail.gmail.com>

On Wed, Mar 31, 2010 at 10:24 PM, Yuehai Xu <yuehaixu@gmail.com> wrote:
> Hi,
>
> I noticed that some one said NOOP is usually the default I/O scheduler
> for hardware RAID. Why not CFQ? Suppose there are just several
> sequential read processes, as I know CFQ will keep all the disk heads
> of hard raid to serve a process for a while(a time slice), in that
> case, CFQ should be the best of all I/O schedulers. Am I right?
> Generally, hard raid should have its own I/O scheduler in their
> firmware, in that case, the I/O scheduler of OS should do nothing
> except dispatch the requests as soon as possible, it is the hard raid
> itself to decide how to schedule these requests. From this point of
> view, NOOP should be the default one. I am really confused here.
Even single ncq disks have an I/O scheduler nowadays.
To clear the confusion, we should make a distinction between
work-conserving and non-work-conserving I/O schedulers.
    * A work-conserving scheduler (e.g. deadline, noop) is idle only
when there is no request pending
    * A non-work-conserving scheduler (e.g. CFQ, AS) may be idle at
any time, in an effort to improve request pattern locality or to
provide fairness.
A non-work-conserving scheduler in the host computer will generally
perform bad if the RAID also has a non-work-conserving scheduler,
because the decision to idle taken by the two schedulers could
conflict, causing disk utilization to drop needlessly. In that case,
NOOP or even better, deadline, could perform much better.

If the raid controller has a work-conserving I/O scheduler, instead
(single NCQ disks and cheap RAID cards typically have this kind of
schedulers), CFQ can effectively control the access pattern (by
queuing only the requests pertinent to the pattern and delaying the
others), and will take advantage when possible of the better
understanding of disk geometry by the lower level scheduler for some
kind of patterns (the ones for which we can submit multiple requests
in parallel, namely random access patterns).
In this case, we suggest to try CFQ and report if you see regressions
w.r.t. NOOP or deadline on some workloads, so we can tune it better.
>
> The next question is about the maximal number of disks in disk array,
> the fault tolerance should be one limitation because the more the
> number of disks, the higher chance of failure. However, may throughput
> also be one limitation? Do you know anyone use disk array which
> contains large number of disks to handle small requests? Such as 256
> disks to handle 4K requests?
You can use multiple disks to handle many parallel random requests.
You should check, though, the queue depth of your raid card, that
limits the actual number of requests issued in parallel.
If it is lower than the number of disks (e.g. it is 31 on SATA), then
the additional disks are wasted for random access patterns.

Thanks,
Corrado
>
> Thanks!
>
> Yuehai
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

     prev parent reply	other threads:[~2010-04-02  8:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 20:24 Questions about RAID and I/O scheduler Yuehai Xu
2010-04-02  8:02 ` Corrado Zoccolo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x2h4e5e476b1004020102gaf301a69oc4e0ffda85c2ce39@mail.gmail.com \
    --to=czoccolo@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=yhxu@wayne.edu \
    --cc=yuehaixu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).