From: Stan Hoeppner <stan@hardwarefreak.com>
To: NeilBrown <neilb@suse.de>
Cc: Shaohua Li <shli@kernel.org>,
linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 0/3 v3] MD: improve raid1/10 write performance for fast storage
Date: Fri, 29 Jun 2012 23:37:59 -0500 [thread overview]
Message-ID: <4FEE82A7.4060602@hardwarefreak.com> (raw)
In-Reply-To: <20120629125256.31de1c2b@notabene.brown>
On 6/28/2012 9:52 PM, NeilBrown wrote:
> On Thu, 28 Jun 2012 20:29:21 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
>
>> On 6/28/2012 4:03 AM, NeilBrown wrote:
>>> On Wed, 13 Jun 2012 17:11:43 +0800 Shaohua Li <shli@kernel.org> wrote:
>>>
>>>> In raid1/10, all write requests are dispatched in a single thread. In fast
>>>> storage, the thread is a bottleneck, because it dispatches request too slow.
>>>> Also the thread migrates freely, which makes request completion cpu not match
>>>> with submission cpu even driver/block layer has such capability. This will
>>>> cause bad cache issue. Both these are not a big deal for slow storage.
>>>>
>>>> Switching the dispatching to percpu/perthread based dramatically increases
>>>> performance. The more raid disk number is, the more performance boosts. In a
>>>> 4-disk raid10 setup, this can double the throughput.
>>>>
>>>> percpu/perthread based dispatch doesn't harm slow storage. This is the way how
>>>> raw device is accessed, and there is correct block plug set which can help do
>>>> request merge and reduce lock contention.
>>>>
>>>> V2->V3:
>>>> rebase to latest tree and fix cpuhotplug issue
>>>>
>>>> V1->V2:
>>>> 1. droped direct dispatch patches. That has better performance imporvement, but
>>>> is hopelessly made correct.
>>>> 2. Add a MD specific workqueue to do percpu dispatch.
>>
>>
>>> I still don't like the per-cpu allocations and the extra work queues.
>>
>> Why don't you like this method Neil? Complexity? The performance seems
>> to be there.
>>
>
> Not an easy question to answer. It just doesn't "taste" nice.
> I certainly like the performance and if this is the only way to get that
> performance then we'll probably go that way. But I'm not convinced it is the
> only way and I want to explore other options first.
I completely agree with the philosophy of exploring multiple options.
> I guess it feels a bit heavy handed. On machines with 1024 cores, per-cpu
> allocations and per-cpu threads are not as cheap as they are one 2-core
> machines. And I'm hoping for a 1024-core phone soon :-)
The only 1024 core machines on the planet are SGI Altix UV (up to 2560
cores). And they make extensive use of per-cpu allocations and threads
in both XVM (the SGI Linux volume manager) and XFS. Keep in mind that
the CpuMemSets API which enables this originated at SGI. The storage is
FC SAN RAID, and XVM is used to stripe or concatenate the hw RAID LUNs.
Without per-cpu threads this machine's IO couldn't scale.
Quoting Geoffrey Wehrman of SGI, from a post to the XFS list:
"With an SGI IS16000 array which supports up to 1,200 drives,
filesystems with large numbers of drives isn't difficult. Most
configurations using the IS16000 have 8+2 RAID6 luns. I've seen
sustained 15 GB/s to a single filesystem on one of the arrays with a 600
drive configuration. The scalability of XFS is impressive."
Without per-cpu threads in XVM and XFS this level of throughput wouldn't
be possible. XVM is closed source, but the XFS devs would probably be
open to discussing how they do this, their beef with your current
default stripe chunk size not withstanding. ;)
--
Stan
next prev parent reply other threads:[~2012-06-30 4:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-13 9:11 [patch 0/3 v3] MD: improve raid1/10 write performance for fast storage Shaohua Li
2012-06-13 9:11 ` [patch 1/3 v3] MD: add a specific workqueue to do dispatch Shaohua Li
2012-06-13 9:11 ` [patch 2/3 v3] raid1: percpu dispatch for write request if bitmap supported Shaohua Li
2012-06-13 9:11 ` [patch 3/3 v3] raid10: " Shaohua Li
2012-06-28 9:03 ` [patch 0/3 v3] MD: improve raid1/10 write performance for fast storage NeilBrown
2012-06-29 1:29 ` Stan Hoeppner
2012-06-29 2:52 ` NeilBrown
2012-06-29 3:02 ` Roberto Spadim
2012-06-30 4:37 ` Stan Hoeppner [this message]
2012-06-29 6:10 ` Shaohua Li
2012-07-02 7:36 ` Shaohua Li
2012-07-03 8:58 ` Shaohua Li
2012-07-04 1:45 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEE82A7.4060602@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=axboe@kernel.dk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).