From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Raz Ben-Jehuda(caro)" Subject: Re: raid5 write performance Date: Sun, 13 Aug 2006 16:19:19 +0300 Message-ID: <5d96567b0608130619w60d8d883q4ffbfefcf650ee82@mail.gmail.com> References: <5d96567b0607020702p25d66490i79445bac606e5210@mail.gmail.com> <17576.18978.563672.656847@cse.unsw.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <17576.18978.563672.656847@cse.unsw.edu.au> Content-Disposition: inline Sender: linux-raid-owner@vger.kernel.org To: Linux RAID Mailing List Cc: Neil Brown List-Id: linux-raid.ids well ... me again Following your advice.... I added a deadline for every WRITE stripe head when it is created. in raid5_activate_delayed i checked if deadline is expired and if not i am setting the sh to prereadactive mode as . This small fix ( and in few other places in the code) reduced the amount of reads to zero with dd but with no improvement to throghput. But with random access to the raid ( buffers are aligned by the stripe width and with the size of stripe width ) there is an improvement of at least 20 % . Problem is that a user must know what he is doing else there would be a reduction in performance if deadline line it too long (say 100 ms). raz On 7/3/06, Neil Brown wrote: > On Sunday July 2, raziebe@gmail.com wrote: > > Neil hello. > > > > I have been looking at the raid5 code trying to understand why writes > > performance is so poor. > > raid5 write performance is expected to be poor, as you often need to > pre-read data or parity before the write can be issued. > > > If I am not mistaken here, It seems that you issue a write in size of > > one page an no more no matter what buffer size I am using . > > I doubt the small write size would contribute more than a couple of > percent to the speed issue. Scheduling (when to write, when to > pre-read, when to wait a moment) is probably much more important. > > > > > 1. Is this page is directed only to parity disk ? > > No. All drives are written with one page units. Each request is > divided into one-page chunks, these one page chunks are gathered - > where possible - into strips, and the strips are handled as units > (Where a strip is like a stripe, only 1 page wide rather then one chunk > wide - if that makes sense). > > > 2. How can i increase the write throughout ? > > Look at scheduling patterns - what order are the blocks getting > written, do we pre-read when we don't need to, things like that. > > The current code tries to do the right thing, and it certainly has > been worse in the past, but I wouldn't be surprised if it could still > be improved. > > NeilBrown > -- Raz