linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: linux-raid@vger.kernel.org
Cc: neilb@suse.de, axboe@kernel.dk, dan.j.williams@intel.com,
	shli@fusionio.com
Subject: [patch 00/10 v3] raid5: improve write performance for fast storage
Date: Mon, 25 Jun 2012 15:24:47 +0800	[thread overview]
Message-ID: <20120625072447.268095276@kernel.org> (raw)

Hi,

Like raid 1/10, raid5 uses one thread to handle stripe. In a fast storage, the
thread becomes a bottleneck. raid5 can offload calculation like checksum to
async threads. And if storge is fast, scheduling async work and running async
work will introduce heavy lock contention of workqueue, which makes such
optimization useless. And calculation isn't the only bottleneck. For example,
in my test raid5 thread must handle > 450k requests per second. Just doing
dispatch and completion will make raid5 thread incapable. The only chance to
scale is using several threads to handle stripe.

Simpliy using several threads doesn't work. conf->device_lock is a global lock
which is heavily contended. patch 3-9 in the set are trying to address this
problem. With them, when several threads are handling stripe, device_lock is
still contended but takes much less cpu time and not the heavist locking any
more. Even the 10th patch isn't accepted, the patch 3-9 look good to merge.

I did stress test (block size range 1k - 64k with a small total size, so
overlap/stripe sharing guaranteed) with the patches and looks fine except some
issues fixed in the first two patches. That issues aren't related to the series,
but I need them in stress test.

With the locking issue solved (at least largely), switching stripe handling to
multiple threads is trival.

Threads are still created in advance (default thread number is disk number) and
can be reconfigured by user. Automatically creating and reaping threads is
great, but I'm worrying about numa binding.

In a 3-disk raid5 setup, 2 extra threads can provide 130% throughput
improvement (double stripe_cache_size) and the throughput is pretty close to
theory value. With >=4 disks, the improvement is even bigger, for example, can
improve 200% for 4-disk setup, but the throughput is far less than theory
value, which is caused by several factors like request queue lock contention,
cache issue, latency introduced by how a stripe is handled in different disks.
Those factors need further investigations.

V2->V3:
1. fixed a hang caused by stripe with both STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE
bit.
2. fixed issue pointed out by Dan
3. Doesn't always wakeup all worker threads any more.

V1->V2:
1. fixed several issues pointed out by Neil and Dan.
2. fixed a wake_up issue.

Thanks,
Shaohua

             reply	other threads:[~2012-06-25  7:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-25  7:24 Shaohua Li [this message]
2012-06-25  7:24 ` [patch 01/10 v3] raid5: use wake_up_all for overlap waking Shaohua Li
2012-06-28  7:26   ` NeilBrown
2012-06-28  8:53     ` Shaohua Li
2012-06-25  7:24 ` [patch 02/10 v3] raid5: delayed stripe fix Shaohua Li
2012-07-02  0:46   ` NeilBrown
2012-07-02  0:49     ` Shaohua Li
2012-07-02  0:55       ` NeilBrown
2012-06-25  7:24 ` [patch 03/10 v3] raid5: add a per-stripe lock Shaohua Li
2012-07-02  0:50   ` NeilBrown
2012-07-02  3:16     ` Shaohua Li
2012-07-02  7:39       ` NeilBrown
2012-07-03  1:27         ` Shaohua Li
2012-07-03 12:16         ` majianpeng
2012-07-03 23:56           ` NeilBrown
2012-07-04  1:09             ` majianpeng
2012-06-25  7:24 ` [patch 04/10 v3] raid5: lockless access raid5 overrided bi_phys_segments Shaohua Li
2012-06-25  7:24 ` [patch 05/10 v3] raid5: remove some device_lock locking places Shaohua Li
2012-06-25  7:24 ` [patch 06/10 v3] raid5: reduce chance release_stripe() taking device_lock Shaohua Li
2012-07-02  0:57   ` NeilBrown
2012-06-25  7:24 ` [patch 07/10 v3] md: personality can provide unplug private data Shaohua Li
2012-07-02  1:06   ` NeilBrown
2012-06-25  7:24 ` [patch 08/10 v3] raid5: make_request use batch stripe release Shaohua Li
2012-07-02  2:31   ` NeilBrown
2012-07-02  2:59     ` Shaohua Li
2012-07-02  5:07       ` NeilBrown
2012-06-25  7:24 ` [patch 09/10 v3] raid5: raid5d handle stripe in batch way Shaohua Li
2012-07-02  2:32   ` NeilBrown
2012-06-25  7:24 ` [patch 10/10 v3] raid5: create multiple threads to handle stripes Shaohua Li
2012-07-02  2:39   ` NeilBrown
2012-07-02 20:03   ` Dan Williams
2012-07-03  8:04     ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120625072447.268095276@kernel.org \
    --to=shli@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fusionio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).