All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org, axboe@kernel.dk, shli@fusionio.com
Subject: Re: [patch 2/4] raid1: percpu dispatch for write request if bitmap supported
Date: Thu, 24 May 2012 13:50:15 +0800	[thread overview]
Message-ID: <20120524055015.GA5579@kernel.org> (raw)
In-Reply-To: <20120524153412.106d5a0c@notabene.brown>

On Thu, May 24, 2012 at 03:34:12PM +1000, NeilBrown wrote:
> On Thu, 24 May 2012 13:17:09 +0800 Shaohua Li <shli@kernel.org> wrote:
> 
> > On Thu, May 24, 2012 at 01:34:02PM +1000, NeilBrown wrote:
> > > On Wed, 23 May 2012 15:26:21 +0800 Shaohua Li <shli@kernel.org> wrote:
> > > 
> > > > In raid1, all write requests are dispatched in raid1d thread. In fast storage,
> > > > the raid1d thread is a bottleneck, because it dispatches request too slow. Also
> > > > raid1d thread migrates freely, which makes request completion cpu not match
> > > > with submission cpu even driver/block layer has such capability. This will
> > > > cause bad cache issue.
> > > > 
> > > > If bitmap support is enabled, write requests can only be dispatched after dirty
> > > > bitmap is flushed out. After bitmap is flushed, how write requests are
> > > > dispatched doesn't impact correctness. A natural idea is to distribute request
> > > > dispatch to several threads. With this patch, requests are added to a percpu
> > > > list first. After bitmap is flushed, then the percpu list requests will
> > > > dispatched in a workqueue. In this way, above bottleneck is removed.
> > > > 
> > > > In a 4k randwrite test with a 2 disks setup, below patch can provide 10% ~ 50%
> > > > performance improvements depending on numa binding.
> > > 
> > > Those numbers are quite impressive so there is certainly room for improvement
> > > here.  I'm not sure that I'm entirely comfortable with the approach though.
> > > 
> > > Passing the request to a per-cpu thread does make sense, but a generic
> > > per-cpu thread feels dangerous as we don't know what else might be queued to
> > > that thread and because of the potential for deadlocks between memory
> > > allocation and generic_make_request (as mentioned in previous email) I find
> > > it hard to convince myself that this approach is entirely safe.
> > 
> > No problem, we can use a separate workqueue.
> 
> A separate set of per-cpu workqueues for each md array doesn't sound like a
> good idea to me - if we can possibly avoid it.

Can be a separate workqueue sharing for all md arrays.
  
> > > I wonder if we might take a very different approach and try to do everything
> > > in the one process.  i.e. don't hand tasks off to other threads at all - at
> > > least in the common case. 
> > > So we could change plugger_unplug (in md.c) so that:
> > > 
> > >  - if current->bio_list is NULL, (meaning all requests have been submitted
> > >    and there is no risk of deadlock) we call bitmap_unplug and then submit
> > >    all the queued writes.
> > >  - if current->bio_list is not NULL, then we just wakeup the md thread to
> > >    do the work.
> > 
> > The current->bio_list check does make sense. I'm going to do the check for the
> > 1 and 3 patches.
> > 
> > But I believe we can't call generic_make_request in unplug. For example,
> > schedule()->unplug->generic_make_request->get_request_wait()->schedule(). At
> 
> My ideas was to only submit the queued writes from a normal call to
> blk_finish_plug.  Calls that come from schedule would use the current
> approach of signalling the md thread to take over.  There must be a way to
> tell the difference - maybe just a new flag to the unplug callback.

Should be the same I think. For exmaple, blk_finish_plug unplug, plug_cb should
generate several bios, but for the first bio, get_request_wait sleeps and flush
plug list again. later bio hasn't a chance to be queued actually till next time
the task runs. Then unplug loses its meaning.

Thanks,
Shaohua

  reply	other threads:[~2012-05-24  5:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-23  7:26 [patch 0/4] MD: improve raid1/10 write performance for fast storage Shaohua Li
2012-05-23  7:26 ` [patch 1/4] raid1: directly dispatch write request if no bitmap Shaohua Li
2012-05-24  2:21   ` NeilBrown
2012-05-24  2:54     ` Shaohua Li
2012-05-24  3:09       ` NeilBrown
2012-05-24  3:31         ` Shaohua Li
2012-05-24  3:51           ` NeilBrown
2012-05-23  7:26 ` [patch 2/4] raid1: percpu dispatch for write request if bitmap supported Shaohua Li
2012-05-24  3:34   ` NeilBrown
2012-05-24  5:17     ` Shaohua Li
2012-05-24  5:34       ` NeilBrown
2012-05-24  5:50         ` Shaohua Li [this message]
2012-05-23  7:26 ` [patch 3/4] raid10: directly dispatch write request if no bitmap Shaohua Li
2012-05-23  7:26 ` [patch 4/4] raid10: percpu dispatch for write request if bitmap supported Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120524055015.GA5579@kernel.org \
    --to=shli@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fusionio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.