linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>,
	linux-raid@vger.kernel.org, axboe@kernel.dk, shli@fusionio.com
Subject: Re: [patch 8/8] raid5: create multiple threads to handle stripes
Date: Thu, 21 Jun 2012 18:09:55 +0800	[thread overview]
Message-ID: <20120621100955.GA255@kernel.org> (raw)
In-Reply-To: <CAA9_cmc0C3V9SG49UpxW+GAhWo+O8+cnyac6OxROkiTeqcrRcA@mail.gmail.com>

On Tue, Jun 12, 2012 at 09:08:17PM -0700, Dan Williams wrote:
> On Wed, Jun 6, 2012 at 11:45 PM, Shaohua Li <shli@kernel.org> wrote:
> > On Thu, Jun 07, 2012 at 11:39:58AM +1000, NeilBrown wrote:
> >> On Mon, 04 Jun 2012 16:02:00 +0800 Shaohua Li <shli@kernel.org> wrote:
> >>
> >> > Like raid 1/10, raid5 uses one thread to handle stripe. In a fast storage, the
> >> > thread becomes a bottleneck. raid5 can offload calculation like checksum to
> >> > async threads. And if storge is fast, scheduling async work and running async
> >> > work will introduce heavy lock contention of workqueue, which makes such
> >> > optimization useless. And calculation isn't the only bottleneck. For example,
> >> > in my test raid5 thread must handle > 450k requests per second. Just doing
> >> > dispatch and completion will make raid5 thread incapable. The only chance to
> >> > scale is using several threads to handle stripe.
> >> >
> >> > With this patch, user can create several extra threads to handle stripe. How
> >> > many threads are better depending on disk number, so the thread number can be
> >> > changed in userspace. By default, the thread number is 0, which means no extra
> >> > thread.
> >> >
> >> > In a 3-disk raid5 setup, 2 extra threads can provide 130% throughput
> >> > improvement (double stripe_cache_size) and the throughput is pretty close to
> >> > theory value. With >=4 disks, the improvement is even bigger, for example, can
> >> > improve 200% for 4-disk setup, but the throughput is far less than theory
> >> > value, which is caused by several factors like request queue lock contention,
> >> > cache issue, latency introduced by how a stripe is handled in different disks.
> >> > Those factors need further investigations.
> >> >
> >> > Signed-off-by: Shaohua Li <shli@fusionio.com>
> >>
> >> I think it is great that you have got RAID5 to the point where multiple
> >> threads improve performance.
> >> I really don't like the idea of having to configure that number of threads.
> >>
> >> It would be great if it would auto-configure.
> >> Maybe the main thread could fork aux threads when it notices a high load.
> >> e.g. if it has been servicing requests for more than 100ms without a break,
> >> and the number of threads is less than the number of CPUs, then it forks a new
> >> helper and resets the timer.
> >>
> >> If a thread has been idle for more than 30 minutes, it exits.
> >>
> >> Might that be reasonable?
> >
> > Yep, I bet this patch needs more discussion. auto-configure is preferred. Your
> > idea is worthy doing. However, the concern is if doing auto fork/kill thread,
> > user can't do numa binding, which is important for high speed storage. Maybe
> > have a reasonable default thread number, like one thread one disk? Need more
> > investigations, I'm open to any suggestion in this side.
> 
> The last time I looked at this the btrfs thread pool looked like a
> good candidate:
> 
>   http://marc.info/?l=linux-raid&m=126944260704907&w=2
> 
> ...have not looked if Tejun has made this available as a generic workqueue mode.

I tried to create a UNBOUND workqueue and set max active to the cpu number, so
each cpu will handle one work. In the work, the cpu will handle 8 stripes. The
throughput is relative ok, but CPU utilization is very high compared to just
create 3 or 4 threads like the patch does. There is heavy lock contention in
block queue_lock, since every cpu now dispatches request. There are other
issues like cache, raid5 device_lock has more contention too. It appears too
many threads to handle stripe isn't as good as expected.

Thanks,
Shaohua

  reply	other threads:[~2012-06-21 10:09 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04  8:01 [patch 0/8] raid5: improve write performance for fast storage Shaohua Li
2012-06-04  8:01 ` [patch 1/8] raid5: add a per-stripe lock Shaohua Li
2012-06-07  0:54   ` NeilBrown
2012-06-07  6:29     ` Shaohua Li
2012-06-07  6:35       ` NeilBrown
2012-06-07  6:52         ` Shaohua Li
2012-06-12 21:02           ` Dan Williams
2012-06-13  4:08             ` Dan Williams
2012-06-13  4:23               ` Shaohua Li
2012-06-12 21:10   ` Dan Williams
2012-06-04  8:01 ` [patch 2/8] raid5: lockless access raid5 overrided bi_phys_segments Shaohua Li
2012-06-07  1:06   ` NeilBrown
2012-06-12 20:41     ` Dan Williams
2012-06-04  8:01 ` [patch 3/8] raid5: remove some device_lock locking places Shaohua Li
2012-06-04  8:01 ` [patch 4/8] raid5: reduce chance release_stripe() taking device_lock Shaohua Li
2012-06-07  0:50   ` NeilBrown
2012-06-04  8:01 ` [patch 5/8] raid5: add batch stripe release Shaohua Li
2012-06-04  8:01 ` [patch 6/8] raid5: make_request use " Shaohua Li
2012-06-07  1:23   ` NeilBrown
2012-06-07  6:33     ` Shaohua Li
2012-06-07  7:33       ` NeilBrown
2012-06-07  7:58         ` Shaohua Li
2012-06-08  6:16           ` Shaohua Li
2012-06-08  6:42             ` NeilBrown
2012-06-04  8:01 ` [patch 7/8] raid5: raid5d handle stripe in batch way Shaohua Li
2012-06-07  1:32   ` NeilBrown
2012-06-07  6:35     ` Shaohua Li
2012-06-07  7:38       ` NeilBrown
2012-06-04  8:02 ` [patch 8/8] raid5: create multiple threads to handle stripes Shaohua Li
2012-06-07  1:39   ` NeilBrown
2012-06-07  6:45     ` Shaohua Li
2012-06-13  4:08       ` Dan Williams
2012-06-21 10:09         ` Shaohua Li [this message]
2012-07-02 20:43           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120621100955.GA255@kernel.org \
    --to=shli@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fusionio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).