public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: jw schultz <jw@pegasys.ws>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: raid0 slower than devices it is assembled of?
Date: Wed, 17 Dec 2003 18:47:13 -0800	[thread overview]
Message-ID: <20031218024713.GG9137@pegasys.ws> (raw)
In-Reply-To: <20031217192244.GB12121@mail.shareable.org>

On Wed, Dec 17, 2003 at 07:22:44PM +0000, Jamie Lokier wrote:
> Linus Torvalds wrote:
> > My personal guess is that modern RAID0 stripes should be on the order of
> > several MEGABYTES in size rather than the few hundred kB that most people
> > use (not to mention the people who have 32kB stripes or smaller - they
> > just kill their IO access patterns with that, and put the CPU at
> > ridiculous strain).
> 
> If a large fs-level I/O transaction is split into lots of 32k
> transactions by the RAID layer, many of those 32k transactions will be
> contiguous on the disks.
> 
> That doesn't mean they're contiguous from the fs point of view, but
> given that all modern hardware does scatter-gather, shouldn't the
> contiguous transactions be merged before being sent to the disk?
> 
> It may strain the CPU (splitting and merging in a different order lots
> of requests), but I don't see why it should kill I/O access patterns,
> as they can be as large as if you had large stripes in the first place.

Only now instead of the latency of one disk seeking to
service the request you have the worst case latency of all
the disks.

Years ago i had a SCSI outboard HW RAID-5 array of 5 disks
on two chains.  The controller used a 512 byte chunk so a
stripe was 2KB.  A single 2KB read would flash lights on 4
drives simultaneously.  An aligned 2KB write would calculate
parity without any reads and write to all 5 at once.  Any
I/O 4KB or larger would engage all 5 drives in parallel.
Given that the OS in question had a 2KB page size and the
filesystems had a 2KB block size it worked pretty well.
When i spec'd the array i made sure the stripe size would
align with access -- one drive more or less and the whole
thing would have been a disaster.

At that time the xfer rate of the drives was a fraction of
what it was today and this setup allowed the array to
saturate the SCSI connection to the host.  Which is
something the drives could not do individually.  However,
disk latency was worst case of the drives although since
they ran almost lock-step wasn't much longer than single
drive latency.  This was just one step up from RAID-3.

Today xfer rates are an order of magnitude higher while
latency has not shrunk.  In fact, by reducing platter count
many drives today have worse latency.  I don't think i'd
ever recommend such a small stripe size today, the latency
of handshaking and the overhead of splitting and merging
would outweigh the bandwidth gains in all but a few rare
applications.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

  parent reply	other threads:[~2003-12-18  2:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-15 13:34 raid0 slower than devices it is assembled of? Witold Krecicki
2003-12-15 15:44 ` Witold Krecicki
2003-12-16  4:01 ` jw schultz
2003-12-16 14:51   ` Helge Hafting
2003-12-16 16:42     ` Linus Torvalds
2003-12-16 20:58       ` Mike Fedyk
2003-12-16 21:11         ` Linus Torvalds
2003-12-17 10:53           ` Jörn Engel
2003-12-17 11:39           ` Peter Zaitsev
2003-12-17 16:01             ` Linus Torvalds
2003-12-17 18:37               ` Mike Fedyk
2003-12-17 21:55               ` bill davidsen
2003-12-17 17:02             ` bill davidsen
2003-12-17 20:14               ` Peter Zaitsev
2003-12-17 19:22       ` Jamie Lokier
2003-12-17 19:40         ` Linus Torvalds
2003-12-17 22:36           ` bill davidsen
2003-12-18  2:47         ` jw schultz [this message]
2003-12-17 22:29       ` bill davidsen
2003-12-18  2:18         ` jw schultz
2004-01-08  4:54       ` Greg Stark
2003-12-16 20:51     ` Andre Hedrick
2003-12-16 21:04       ` Andre Hedrick
2003-12-16 21:46         ` Witold Krecicki
2003-12-16 20:09   ` Witold Krecicki
2003-12-16 21:11   ` Adam Kropelin
2003-12-16 21:25 ` jw schultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031218024713.GG9137@pegasys.ws \
    --to=jw@pegasys.ws \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox