Re: Odd (slow) RAID performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: Mark Hahn <hahn@physics.mcmaster.ca>
Cc: linux-raid@vger.kernel.org
Subject: Re: Odd (slow) RAID performance
Date: Tue, 19 Dec 2006 23:05:41 -0500	[thread overview]
Message-ID: <4588B695.4000203@tmr.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0612131227560.26047@coffee.psychology.mcmaster.ca>

Mark Hahn wrote:
>>>> which is right at the edge of what I need. I want to read the doc on
>>>> stripe_cache_size before going huge, if that's K 10MB is a LOT of 
>>>> cache
>>>> when 256 works perfectly in RAID-0.
>
> but they are basically unrelated.  in r5/6, the stripe cache is 
> absolutely
> critical in caching parity chunks.  in r0, never functions this way, 
> though
> it may help some workloads a bit (IOs which aren't naturally aligned 
> to the underlying disk layout.)
>
>>>> Any additional input appreciated, I would expect the speed to be 
>>>> (Ndisk
>>>> - 1)*SingleDiskSpeed without a huge buffer, so the fact that it isn't
>
> as others have reported, you can actually approach that with "naturally"
> aligned and sized writes.
I don't know what would be natural, I have three drives, 256 chunk size 
and was originally testing with 1MB writes. I have a hard time seeing a 
case where there would be a need to read-alter-rewrite, each chunk 
should be writable as data1, data2, and parity, without readback. I was 
writing directly to the array, so the data should start on a chunk 
boundary. Until I went very large on stripe-cache-size performance was 
almost exactly 100% the write speed of a single drive. There is no 
obvious way to explain that other than writing one drive at a time. And 
shrinking write size by factors of two resulted in decreasing 
performance down to about 13% of the speed of a single drive. Such 
performance just isn't useful, and going to RAID-10 eliminated the 
problem, indicating that the RAID-5 implementation is the cause.
>
>> I'm doing the tests writing 2GB of data to the raw array, in 1MB 
>> writes. The array is RAID-5 with 256 chunk size. I wouldn't really 
>> expect any reads,
>
> but how many disks?  if your 1M writes are to 4 data disks, you stand 
> a chance of streaming (assuming your writes are naturally aligned, or 
> else you'll be somewhat dependent on the stripe cache.)
> in other words, your whole-stripe size is ndisks*chunksize, and for 
> 256K chunks and, say, 14 disks, that's pretty monstrous...
Three drives, so they could be totally isolated from other i/o.
>
> I think that's a factor often overlooked - large chunk sizes, especially
> with r5/6 AND lots of disks, mean you probably won't ever do "blind" 
> updates, and thus need the r/m/w cycle.  in that case, if the stripe 
> cache
> is not big/smart enough, you'll be limited by reads.
I didn't have lots of disks, and when the data and parity are all being 
updated in full chunk increments, there's no reason for a read, since 
the data won't be needed. I agree that it's probably being read, but 
needlessly.
>
> I'd like to experiment with this, to see how much benefit you really 
> get from using larger chunk sizes.  I'm guessing that past 32K
> or so, normal *ata systems don't speedup much.  fabrics with higher 
> latency or command/arbitration overhead would want larger chunks.
>
>> tried was 2K blocks, so I can try other sizes. I have a hard time 
>> picturing why smaller sizes would be better, but that's what testing 
>> is for.
>
> larger writes (from user-space) generally help, probably up to MB's.
> smaller chunks help by making it more likley to do blind parity updates;
> a larger stripe cache can help that too.
I tried 256B to 1MB sizes, 1MB was best, or more correctly least 
unacceptable.
>
> I think I recall an earlier thread regarding how the stripe cache is used
> somewhat naively - that all IO goes through it.  the most important 
> blocks would be parity and "ends" of a write that partially update an 
> underlying chunk.  (conversely, don't bother caching anything which 
> can be blindly written to disk.) 
I fear that last parenthetical isn't being observed.

If it weren't for RAID-1 and RAID-10 being fast I wouldn't complain 
about RAID-5.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

next prev parent reply	other threads:[~2006-12-20  4:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-30 14:13 Odd (slow) RAID performance Bill Davidsen
2006-11-30 14:31 ` Roger Lucas
2006-11-30 15:30   ` Bill Davidsen
2006-11-30 15:32     ` Roger Lucas
2006-11-30 21:09       ` Bill Davidsen
2006-12-01  9:24         ` Roger Lucas
2006-12-02  5:27           ` Bill Davidsen
2006-12-05  1:33             ` Dan Williams
2006-12-07 15:51               ` Bill Davidsen
2006-12-08  1:15                 ` Corey Hickey
2006-12-08  8:21                 ` Gabor Gombas
2006-12-08  6:01               ` Neil Brown
2006-12-08  7:28                 ` Neil Brown
2006-12-09 20:20                   ` Bill Davidsen
2006-12-12 17:44                   ` Bill Davidsen
2006-12-12 18:48                     ` Raz Ben-Jehuda(caro)
2006-12-12 21:51                       ` Bill Davidsen
2006-12-13 17:44                         ` Mark Hahn
2006-12-20  4:05                           ` Bill Davidsen [this message]
2006-12-09 20:16                 ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4588B695.4000203@tmr.com \
    --to=davidsen@tmr.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.