linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Raz Ben-Jehuda(caro)" <raziebe@gmail.com>
To: Neil Brown <neilb@suse.de>
Cc: Linux RAID Mailing List <linux-raid@vger.kernel.org>
Subject: Re: raid5 write performance
Date: Thu, 19 Apr 2007 11:28:48 +0300	[thread overview]
Message-ID: <5d96567b0704190128q17b09dc9ld7c95a57a1edfb9f@mail.gmail.com> (raw)
In-Reply-To: <5d96567b0704161329n5c3ca008p56df00baaa16eacb@mail.gmail.com>

On 4/16/07, Raz Ben-Jehuda(caro) <raziebe@gmail.com> wrote:
> On 4/13/07, Neil Brown <neilb@suse.de> wrote:
> > On Saturday March 31, raziebe@gmail.com wrote:
> > >
> > > 4.
> > > I am going to work on this with other configurations, such as raid5's
> > > with more disks and raid50.  I will be happy to hear your opinion on
> > > this matter. what puzzles me is why deadline must be so long as 10 ms?
> > >  the less deadline the more reads I am getting.
> >
> > I've finally had a bit of a look at this.
> >
> > The extra reads are being caused because for the 3msec unplug
> > timeout. Once you plug a queue it will automatically get unplugged 3
> > msec later.  When this happens, any stripes that are on the pending
> > list (waiting to see if more blocks will be written to them) get
> > processed and some pre-reading happens.
> >
> > If you remove the 3msec timeout (I changed it to 300msec) in
> > block/ll_rw_blk.c, the reads go away.  However that isn't a good
> > solution.
> >
> > Your patch effectively ensures that a stripe gets to last at least N
> > msec before being unplugged and pre-reading starts.
> > Why does it need to be 10 msec?  Let's see.
> >
> > When you start writing, you will quickly fill up the stripe cache and
> > then have to wait for stripes to be fully written and become free
> > before you can start attaching more write requests.
> > You could have to wait for a full chunk-wide stripe to be written
> > before another chunk of stripes can proceed.  The first blocks of the
> > second stripe could stay in the stripe cache for the time it takes to
> > write out a stripe.
> >
> > With a 1024K chunk size and 30Meg/second write speed it will take 1/30
> > of a second to write out a chunk-wide stripe, or about 33msec.  So I'm
> > surprised you get by with a deadline of 'only' 10msec.  Maybe there is
> > some over-lapping of chunks that I wasn't taking into account (I did
> > oversimplify the model a bit).
> >
> > So, what is the right heuristic to use to determine when we should
> > start write-processing on an incomplete stripe?  Obviously '3msec' is
> > bad.
> >
> > It seems we don't want to start processing incomplete stripes while
> > there are full stripes being written, but we also don't want to hold
> > up incomplete stripes forever if some other thread is successfully
> > writing complete stripes.
> >
> > So maybe something like this:
> >  - We keep a (cyclic) counter of the number of stripes on which we
> >    have started write, and the number which have completed.
> >  - every time we add a write request to a stripe, we set the deadline
> >    to 3msec in the future, and we record in the stripe the current
> >    value of the number that have started write.
> >  - We process a stripe requiring preread when both the deadline
> >    has expired, and the count of completed writes reaches the recorded
> >    count of commenced writes.
> >
> > Does that make sense?  Would you like to try it?
> >
> > NeilBrown
> >

Neil Hello
I have been doing some thinking. I feel we should take a different path here.
In my tests  I actually accumulate the user's buffers and when ready I submit
them, an elevator like algorithm.

The main problem is the amount of IO's the stripe cache can hold which is
too small. My suggestion is to add an elevator of bios before moving them to the
stripe cache, trying to postpone as much as needed allocation of a new stripe.
This way we will be able to move as much as IOs to the "raid logic"
without congesting
it and still filling stripes if possible.

Psuedo code;

 make_request()
...
   if IO direction is WRITE and IO not in stripe cache
     add IO to raid elevator
..

raid5d()
  ...
  Is there a set of IOs in raid elevator such that they make a full stripe
    move IOs to raid handling
  while oldest IO in raid elevator is deadlined( 3ms ? )
      move IO to raid handling
 ....

Does it make any sense ?

thank you
-- 
Raz

  parent reply	other threads:[~2007-04-19  8:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-02 14:02 raid5 write performance Raz Ben-Jehuda(caro)
2006-07-02 22:35 ` Neil Brown
2006-08-13 13:19   ` Raz Ben-Jehuda(caro)
2006-08-28  4:32     ` Neil Brown
2007-03-30 21:44       ` Raz Ben-Jehuda(caro)
2007-03-31 21:28         ` Bill Davidsen
2007-03-31 23:03           ` Raz Ben-Jehuda(caro)
2007-04-01  2:16             ` Bill Davidsen
2007-04-01 23:08         ` Dan Williams
2007-04-02 14:13           ` Raz Ben-Jehuda(caro)
     [not found]         ` <17950.50209.580439.607958@notabene.brown>
     [not found]           ` <5d96567b0704161329n5c3ca008p56df00baaa16eacb@mail.gmail.com>
2007-04-19  8:28             ` Raz Ben-Jehuda(caro) [this message]
2007-04-19  9:20               ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2005-11-18 14:05 Jure Pečar
2005-11-18 19:19 ` Dan Stromberg
2005-11-18 19:23   ` Mike Hardy
2005-11-19  4:40     ` Guy
2005-11-19  4:57       ` Mike Hardy
2005-11-19  5:54         ` Neil Brown
2005-11-19 11:59           ` Farkas Levente
2005-11-20 23:39             ` Neil Brown
2005-11-19 19:52           ` Carlos Carvalho
2005-11-20 19:54             ` Paul Clements
2005-11-19  5:56         ` Guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d96567b0704190128q17b09dc9ld7c95a57a1edfb9f@mail.gmail.com \
    --to=raziebe@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).