public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Stark <gsstark@mit.edu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Helge Hafting <helgehaf@aitel.hist.no>,
	jw schultz <jw@pegasys.ws>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: raid0 slower than devices it is assembled of?
Date: 07 Jan 2004 23:54:05 -0500	[thread overview]
Message-ID: <87llojujki.fsf@stark.dyndns.tv> (raw)
In-Reply-To: <Pine.LNX.4.58.0312160825570.1599@home.osdl.org>


Linus Torvalds <torvalds@osdl.org> writes:

> The fact is, modern disks are GOOD at streaming data. They're _really_
> good at it compared to just about anything else they ever do. The win you
> get from even medium-sized stripes on RAID0 are likely to not be all that
> noticeable, and you can definitely lose _big_ just because it tends to
> hack your IO patterns to pieces.

I'm not sure how you reach this conclusion. 50MB/s may sound like a lot, it's
sure a whole lot more than the 2MB/s I get on this 486 over here. But then the
hard drive I have that gets 50MB/s is also 250G and the one in the 486 is
425M, a factor of 588 difference in size. So as good as the drives are getting
at streaming data, the amount of data we want to stream is going up even
faster.

> My personal guess is that modern RAID0 stripes should be on the order of
> several MEGABYTES in size rather than the few hundred kB that most people
> use (not to mention the people who have 32kB stripes or smaller - they
> just kill their IO access patterns with that, and put the CPU at
> ridiculous strain).

> Big stripes help because:
> 
>  - disks already do big transfers well, so you shouldn't split them up.
>    Quite frankly, the kinds of access patterns that let you stream
>    multiple streams of 50MB/s and get N-way throughput increases just
>    don't exists in the real world outside of some very special niches (DoD
>    satellite data backup, or whatever).

Or just about any moderate sized SQL database. Virtually any large query will
cause what Oracle calls "full table scan"s or what postgres calls a
"sequential scan" precisely because reading sequential data is way faster than
random access. Often a single query will generate several such streams, and
often large on-disk sorts which have sequential access patterns as well.

It seems to me that having a stripe-size of several megabytes will defeat the
read-ahead and essentially limit the database to 50MB/s which while it seems
like a lot really isn't fast enough to keep up with the increase in the amount
of data being handled. Even a small database with tables around 1GB will
benefit enormously from being able to stream the data at 100MB/s or 150MB/s.

> I may be wrong, of course. But I doubt it.

Well it should be easy enough to test. It would be quite a radical change in
thinking. All the raidtools documentation suggests starting with 32kb and
experimenting -- largely with smaller stripe sizes. I've certainly never
considered anything much larger. It would be really interesting to know how
even a typical database query ran on raid arrays of varying stripe sizes.

-- 
greg


  parent reply	other threads:[~2004-01-08  4:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-15 13:34 raid0 slower than devices it is assembled of? Witold Krecicki
2003-12-15 15:44 ` Witold Krecicki
2003-12-16  4:01 ` jw schultz
2003-12-16 14:51   ` Helge Hafting
2003-12-16 16:42     ` Linus Torvalds
2003-12-16 20:58       ` Mike Fedyk
2003-12-16 21:11         ` Linus Torvalds
2003-12-17 10:53           ` Jörn Engel
2003-12-17 11:39           ` Peter Zaitsev
2003-12-17 16:01             ` Linus Torvalds
2003-12-17 18:37               ` Mike Fedyk
2003-12-17 21:55               ` bill davidsen
2003-12-17 17:02             ` bill davidsen
2003-12-17 20:14               ` Peter Zaitsev
2003-12-17 19:22       ` Jamie Lokier
2003-12-17 19:40         ` Linus Torvalds
2003-12-17 22:36           ` bill davidsen
2003-12-18  2:47         ` jw schultz
2003-12-17 22:29       ` bill davidsen
2003-12-18  2:18         ` jw schultz
2004-01-08  4:54       ` Greg Stark [this message]
2003-12-16 20:51     ` Andre Hedrick
2003-12-16 21:04       ` Andre Hedrick
2003-12-16 21:46         ` Witold Krecicki
2003-12-16 20:09   ` Witold Krecicki
2003-12-16 21:11   ` Adam Kropelin
2003-12-16 21:25 ` jw schultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87llojujki.fsf@stark.dyndns.tv \
    --to=gsstark@mit.edu \
    --cc=helgehaf@aitel.hist.no \
    --cc=jw@pegasys.ws \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox