From: Corey Hickey <bugfood-ml@fatooh.org>
To: linux-raid@vger.kernel.org
Subject: Re: Odd (slow) RAID performance
Date: Thu, 07 Dec 2006 17:15:48 -0800 [thread overview]
Message-ID: <4578BCC4.5010509@fatooh.org> (raw)
In-Reply-To: <4578387D.4010209@tmr.com>
Bill Davidsen wrote:
> Dan Williams wrote:
>> On 12/1/06, Bill Davidsen <davidsen@tmr.com> wrote:
>>> Thank you so much for verifying this. I do keep enough room on my drives
>>> to run tests by creating any kind of whatever I need, but the point is
>>> clear: with N drives striped the transfer rate is N x base rate of one
>>> drive; with RAID-5 it is about the speed of one drive, suggesting that
>>> the md code serializes writes.
>>>
>>> If true, BOO, HISS!
>>>
>>> Can you explain and educate us, Neal? This look like terrible
>>> performance.
>>>
>> Just curious what is your stripe_cache_size setting in sysfs?
>>
>> Neil, please include me in the education if what follows is incorrect:
>>
>> Read performance in kernels up to and including 2.6.19 is hindered by
>> needing to go through the stripe cache. This situation should improve
>> with the stripe-cache-bypass patches currently in -mm. As Raz
>> reported in some cases the performance increase of this approach is
>> 30% which is roughly equivalent to the performance difference I see of
>> a 4-disk raid5 versus a 3-disk raid0.
>>
>> For the write case I can say that MD does not serialize writes. If by
>> serialize you mean that there is 1:1 correlation between writes to the
>> parity disk and writes to a data disk. To illustrate I instrumented
>> MD to count how many times it issued a write to the parity disk and
>> compared that to how many writes it performed to the member disks for
>> the workload "dd if=/dev/zero of=/dev/md0 bs=1024k count=100". I
>> recorded 8544 parity writes and 25600 member disk writes which is
>> about 3 member disk writes per parity write, or pretty close to
>> optimal for a 4-disk array. So, serialization is not the cause,
>> performing sub-stripe width writes is not the cause as >98% of the
>> writes happened without needing to read old data from the disks.
>> However, I see the same performance on my system, about equal to a
>> single disk.
>
> But the number of writes isn't an indication of serialization. If I
> write disk A, then B, then C, then D, you can't tell if I waited for
> each write to finish before starting the next, or did them in parallel.
> And since the write speed is equal to the speed of a single drive,
> effectively that's what happens, even though I can't see it in the code.
For what it's worth, my read and write speeds on a 5-disk RAID-5 are
somewhat faster than the speed of any single drive. The array is a
mixture of two different SATA drives and one IDE drive.
Sustained individual read performances range from 56 MB/sec for the IDE
drive to 68 MB/sec for the faster SATA drives. I can read from the
RAID-5 at about 100MB/sec.
I can't give precise numbers for write speeds, except to say that I can
write to a file on the filesystem (which is mostly full and probably
somewhat fragmented) at about 83 MB/sec.
None of those numbers are equal to the theoretical maximum performance,
so I see your point, but they're still faster than one individual disk.
> I also suspect that write are not being combined, since writing the 2GB
> test runs at one-drive speed writing 1MB blocks, but floppy speed
> writing 2k blocks. And no, I'm not running out of CPU to do the
> overhead, it jumps from 2-4% to 30% of one CPU, but on an unloaded SMP
> system it's not CPU bound.
>>
>> Here is where I step into supposition territory. Perhaps the
>> discrepancy is related to the size of the requests going to the block
>> layer. raid5 always makes page sized requests with the expectation
>> that they will coalesce into larger requests in the block layer.
>> Maybe we are missing coalescing opportunities in raid5 compared to
>> what happens in the raid0 case? Are there any io scheduler knobs to
>> turn along these lines?
>
> Good thought, I had already tried that but not reported it, changing
> schedulers make no significant difference. In the range of 2-3%, which
> is close to the measurement jitter due to head position or whatever.
>
> I changed my swap to RAID-10, but RAID-5 just can't keep up with
> 70-100MB/s data bursts which I need. I'm probably going to scrap
> software RAID and go back to a controller, the write speeds are simply
> not even close to what they should be. I have one more thing to try, a
> tool I wrote to chase another problem a few years ago. I'll report if I
> find something.
I have read that using RAID to stripe swap space is ill-advised, or at
least unnecessary. The kernel will stripe multiple swap devices if you
assign them the same priority.
http://tldp.org/HOWTO/Software-RAID-HOWTO-2.html
If you've been using RAID-10 for swap, then I think you could just
assign multiple RAID-1 devices the same swap priority for the same
effect with (perhaps) less overhead.
-Corey
next prev parent reply other threads:[~2006-12-08 1:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-30 14:13 Odd (slow) RAID performance Bill Davidsen
2006-11-30 14:31 ` Roger Lucas
2006-11-30 15:30 ` Bill Davidsen
2006-11-30 15:32 ` Roger Lucas
2006-11-30 21:09 ` Bill Davidsen
2006-12-01 9:24 ` Roger Lucas
2006-12-02 5:27 ` Bill Davidsen
2006-12-05 1:33 ` Dan Williams
2006-12-07 15:51 ` Bill Davidsen
2006-12-08 1:15 ` Corey Hickey [this message]
2006-12-08 8:21 ` Gabor Gombas
2006-12-08 6:01 ` Neil Brown
2006-12-08 7:28 ` Neil Brown
2006-12-09 20:20 ` Bill Davidsen
2006-12-12 17:44 ` Bill Davidsen
2006-12-12 18:48 ` Raz Ben-Jehuda(caro)
2006-12-12 21:51 ` Bill Davidsen
2006-12-13 17:44 ` Mark Hahn
2006-12-20 4:05 ` Bill Davidsen
2006-12-09 20:16 ` Bill Davidsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4578BCC4.5010509@fatooh.org \
--to=bugfood-ml@fatooh.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.