From: Bill Davidsen <davidsen@tmr.com>
To: "Raz Ben-Jehuda(caro)" <raziebe@gmail.com>
Cc: Neil Brown <neilb@suse.de>,
Linux RAID Mailing List <linux-raid@vger.kernel.org>
Subject: Re: raid5 write performance
Date: Sat, 31 Mar 2007 17:28:33 -0400 [thread overview]
Message-ID: <460ED281.2020409@tmr.com> (raw)
In-Reply-To: <5d96567b0703301444j9b416c2nbc5ce27487eef5bc@mail.gmail.com>
Raz Ben-Jehuda(caro) wrote:
> Please see bellow.
>
> On 8/28/06, Neil Brown <neilb@suse.de> wrote:
>> On Sunday August 13, raziebe@gmail.com wrote:
>> > well ... me again
>> >
>> > Following your advice....
>> >
>> > I added a deadline for every WRITE stripe head when it is created.
>> > in raid5_activate_delayed i checked if deadline is expired and if
>> not i am
>> > setting the sh to prereadactive mode as .
>> >
>> > This small fix ( and in few other places in the code) reduced the
>> > amount of reads
>> > to zero with dd but with no improvement to throghput. But with
>> random access to
>> > the raid ( buffers are aligned by the stripe width and with the size
>> > of stripe width )
>> > there is an improvement of at least 20 % .
>> >
>> > Problem is that a user must know what he is doing else there would be
>> > a reduction
>> > in performance if deadline line it too long (say 100 ms).
>>
>> So if I understand you correctly, you are delaying write requests to
>> partial stripes slightly (your 'deadline') and this is sometimes
>> giving you a 20% improvement ?
>>
>> I'm not surprised that you could get some improvement. 20% is quite
>> surprising. It would be worth following through with this to make
>> that improvement generally available.
>>
>> As you say, picking a time in milliseconds is very error prone. We
>> really need to come up with something more natural.
>> I had hopped that the 'unplug' infrastructure would provide the right
>> thing, but apparently not. Maybe unplug is just being called too
>> often.
>>
>> I'll see if I can duplicate this myself and find out what is really
>> going on.
>>
>> Thanks for the report.
>>
>> NeilBrown
>>
>
> Neil Hello. I am sorry for this interval , I was assigned abruptly to
> a different project.
>
> 1.
> I'd taken a look at the raid5 delay patch I have written a while
> ago. I ported it to 2.6.17 and tested it. it makes sounds of working
> and when used correctly it eliminates the reads penalty.
>
> 2. Benchmarks .
> configuration:
> I am testing a raid5 x 3 disks with 1MB chunk size. IOs are
> synchronous and non-buffered(o_direct) , 2 MB in size and always
> aligned to the beginning of a stripe. kernel is 2.6.17. The
> stripe_delay was set to 10ms.
>
> Attached is the simple_write code.
>
> command :
> simple_write /dev/md1 2048 0 1000
> simple_write raw writes (O_DIRECT) sequentially
> starting from offset zero 2048 kilobytes 1000 times.
>
> Benchmark Before patch
>
> sda 1848.00 8384.00 50992.00 8384 50992
> sdb 1995.00 12424.00 51008.00 12424 51008
> sdc 1698.00 8160.00 51000.00 8160 51000
> sdd 0.00 0.00 0.00 0 0
> md0 0.00 0.00 0.00 0 0
> md1 450.00 0.00 102400.00 0 102400
>
>
> Benchmark After patch
>
> sda 389.11 0.00 128530.69 0 129816
> sdb 381.19 0.00 129354.46 0 130648
> sdc 383.17 0.00 128530.69 0 129816
> sdd 0.00 0.00 0.00 0 0
> md0 0.00 0.00 0.00 0 0
> md1 1140.59 0.00 259548.51 0 262144
>
> As one can see , no additional reads were done. One can actually
> calculate the raid's utilization: n-1/n * ( single disk throughput
> with 1M writes ) .
>
>
> 3. The patch code.
> Kernel tested above was 2.6.17. The patch is of 2.6.20.2
> because I have noticed a big code differences between 17 to 20.x .
> This patch was not tested on 2.6.20.2 but it is essentialy the same. I
> have not tested (yet) degraded mode or any other non-common pathes.
My weekend is pretty taken, but I hope to try putting this patch against
2.6.21-rc6-git1 (or whatever is current Monday), to see not only how it
works against the test program, but also under some actual load. By eye,
my data should be safe, but I think I'll test on a well backed machine
anyway ;-)
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
next prev parent reply other threads:[~2007-03-31 21:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-02 14:02 raid5 write performance Raz Ben-Jehuda(caro)
2006-07-02 22:35 ` Neil Brown
2006-08-13 13:19 ` Raz Ben-Jehuda(caro)
2006-08-28 4:32 ` Neil Brown
2007-03-30 21:44 ` Raz Ben-Jehuda(caro)
2007-03-31 21:28 ` Bill Davidsen [this message]
2007-03-31 23:03 ` Raz Ben-Jehuda(caro)
2007-04-01 2:16 ` Bill Davidsen
2007-04-01 23:08 ` Dan Williams
2007-04-02 14:13 ` Raz Ben-Jehuda(caro)
[not found] ` <17950.50209.580439.607958@notabene.brown>
[not found] ` <5d96567b0704161329n5c3ca008p56df00baaa16eacb@mail.gmail.com>
2007-04-19 8:28 ` Raz Ben-Jehuda(caro)
2007-04-19 9:20 ` Neil Brown
-- strict thread matches above, loose matches on Subject: below --
2005-11-18 14:05 Jure Pečar
2005-11-18 19:19 ` Dan Stromberg
2005-11-18 19:23 ` Mike Hardy
2005-11-19 4:40 ` Guy
2005-11-19 4:57 ` Mike Hardy
2005-11-19 5:54 ` Neil Brown
2005-11-19 11:59 ` Farkas Levente
2005-11-20 23:39 ` Neil Brown
2005-11-19 19:52 ` Carlos Carvalho
2005-11-20 19:54 ` Paul Clements
2005-11-19 5:56 ` Guy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=460ED281.2020409@tmr.com \
--to=davidsen@tmr.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=raziebe@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).