From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Raz Ben-Jehuda(caro)" Subject: Re: raid5 write performance Date: Sun, 1 Apr 2007 01:03:11 +0200 Message-ID: <5d96567b0703311603p1c1cb0fehfff7e94df49c0b4c@mail.gmail.com> References: <5d96567b0607020702p25d66490i79445bac606e5210@mail.gmail.com> <17576.18978.563672.656847@cse.unsw.edu.au> <5d96567b0608130619w60d8d883q4ffbfefcf650ee82@mail.gmail.com> <17650.29175.778076.964022@cse.unsw.edu.au> <5d96567b0703301444j9b416c2nbc5ce27487eef5bc@mail.gmail.com> <460ED281.2020409@tmr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <460ED281.2020409@tmr.com> Content-Disposition: inline Sender: linux-raid-owner@vger.kernel.org To: Bill Davidsen Cc: Neil Brown , Linux RAID Mailing List List-Id: linux-raid.ids On 3/31/07, Bill Davidsen wrote: > Raz Ben-Jehuda(caro) wrote: > > Please see bellow. > > > > On 8/28/06, Neil Brown wrote: > >> On Sunday August 13, raziebe@gmail.com wrote: > >> > well ... me again > >> > > >> > Following your advice.... > >> > > >> > I added a deadline for every WRITE stripe head when it is created. > >> > in raid5_activate_delayed i checked if deadline is expired and if > >> not i am > >> > setting the sh to prereadactive mode as . > >> > > >> > This small fix ( and in few other places in the code) reduced the > >> > amount of reads > >> > to zero with dd but with no improvement to throghput. But with > >> random access to > >> > the raid ( buffers are aligned by the stripe width and with the size > >> > of stripe width ) > >> > there is an improvement of at least 20 % . > >> > > >> > Problem is that a user must know what he is doing else there would be > >> > a reduction > >> > in performance if deadline line it too long (say 100 ms). > >> > >> So if I understand you correctly, you are delaying write requests to > >> partial stripes slightly (your 'deadline') and this is sometimes > >> giving you a 20% improvement ? > >> > >> I'm not surprised that you could get some improvement. 20% is quite > >> surprising. It would be worth following through with this to make > >> that improvement generally available. > >> > >> As you say, picking a time in milliseconds is very error prone. We > >> really need to come up with something more natural. > >> I had hopped that the 'unplug' infrastructure would provide the right > >> thing, but apparently not. Maybe unplug is just being called too > >> often. > >> > >> I'll see if I can duplicate this myself and find out what is really > >> going on. > >> > >> Thanks for the report. > >> > >> NeilBrown > >> > > > > Neil Hello. I am sorry for this interval , I was assigned abruptly to > > a different project. > > > > 1. > > I'd taken a look at the raid5 delay patch I have written a while > > ago. I ported it to 2.6.17 and tested it. it makes sounds of working > > and when used correctly it eliminates the reads penalty. > > > > 2. Benchmarks . > > configuration: > > I am testing a raid5 x 3 disks with 1MB chunk size. IOs are > > synchronous and non-buffered(o_direct) , 2 MB in size and always > > aligned to the beginning of a stripe. kernel is 2.6.17. The > > stripe_delay was set to 10ms. > > > > Attached is the simple_write code. > > > > command : > > simple_write /dev/md1 2048 0 1000 > > simple_write raw writes (O_DIRECT) sequentially > > starting from offset zero 2048 kilobytes 1000 times. > > > > Benchmark Before patch > > > > sda 1848.00 8384.00 50992.00 8384 50992 > > sdb 1995.00 12424.00 51008.00 12424 51008 > > sdc 1698.00 8160.00 51000.00 8160 51000 > > sdd 0.00 0.00 0.00 0 0 > > md0 0.00 0.00 0.00 0 0 > > md1 450.00 0.00 102400.00 0 102400 > > > > > > Benchmark After patch > > > > sda 389.11 0.00 128530.69 0 129816 > > sdb 381.19 0.00 129354.46 0 130648 > > sdc 383.17 0.00 128530.69 0 129816 > > sdd 0.00 0.00 0.00 0 0 > > md0 0.00 0.00 0.00 0 0 > > md1 1140.59 0.00 259548.51 0 262144 > > > > As one can see , no additional reads were done. One can actually > > calculate the raid's utilization: n-1/n * ( single disk throughput > > with 1M writes ) . > > > > > > 3. The patch code. > > Kernel tested above was 2.6.17. The patch is of 2.6.20.2 > > because I have noticed a big code differences between 17 to 20.x . > > This patch was not tested on 2.6.20.2 but it is essentialy the same. I > > have not tested (yet) degraded mode or any other non-common pathes. > My weekend is pretty taken, but I hope to try putting this patch against > 2.6.21-rc6-git1 (or whatever is current Monday), to see not only how it > works against the test program, but also under some actual load. By eye, > my data should be safe, but I think I'll test on a well backed machine > anyway ;-) Bill. This test program WRITES data to a raw device, it will destroy everything you have on the RAID. If you want to use a file system test unit, as mentioned I have one for XFS file system. > -- > bill davidsen > CTO TMR Associates, Inc > Doing interesting things with small computers since 1979 > > -- Raz