From: hare@suse.de (Hannes Reinecke)
Subject: dm-multipath low performance with blk-mq
Date: Sat, 30 Jan 2016 09:52:32 +0100 [thread overview]
Message-ID: <56AC79D0.5060104@suse.de> (raw)
In-Reply-To: <20160129233504.GA13661@redhat.com>
On 01/30/2016 12:35 AM, Mike Snitzer wrote:
> On Wed, Jan 27 2016 at 12:56pm -0500,
> Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>
>>
>>
>> On 27/01/2016 19:48, Mike Snitzer wrote:
>>> On Wed, Jan 27 2016 at 6:14am -0500,
>>> Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>>
>>>>
>>>>>> I don't think this is going to help __multipath_map() without some
>>>>>> configuration changes. Now that we're running on already merged
>>>>>> requests instead of bios, the m->repeat_count is almost always set to 1,
>>>>>> so we call the path_selector every time, which means that we'll always
>>>>>> need the write lock. Bumping up the number of IOs we send before calling
>>>>>> the path selector again will give this patch a change to do some good
>>>>>> here.
>>>>>>
>>>>>> To do that you need to set:
>>>>>>
>>>>>> rr_min_io_rq <something_bigger_than_one>
>>>>>>
>>>>>> in the defaults section of /etc/multipath.conf and then reload the
>>>>>> multipathd service.
>>>>>>
>>>>>> The patch should hopefully help in multipath_busy() regardless of the
>>>>>> the rr_min_io_rq setting.
>>>>>
>>>>> This patch, while generic, is meant to help the blk-mq case. A blk-mq
>>>>> request_queue doesn't have an elevator so the requests will not have
>>>>> seen merging.
>>>>>
>>>>> But yes, implied in the patch is the requirement to increase
>>>>> m->repeat_count via multipathd's rr_min_io_rq (I'll backfill a proper
>>>>> header once it is tested).
>>>>
>>>> I'll test it once I get some spare time (hopefully soon...)
>>>
>>> OK thanks.
>>>
>>> BTW, I _cannot_ get null_blk to come even close to your reported 1500K+
>>> IOPs on 2 "fast" systems I have access to. Which arguments are you
>>> loading the null_blk module with?
>>>
>>> I've been using:
>>> modprobe null_blk gb=4 bs=4096 nr_devices=1 queue_mode=2 submit_queues=12
>>
>> $ for f in /sys/module/null_blk/parameters/*; do echo $f; cat $f; done
>> /sys/module/null_blk/parameters/bs
>> 512
>> /sys/module/null_blk/parameters/completion_nsec
>> 10000
>> /sys/module/null_blk/parameters/gb
>> 250
>> /sys/module/null_blk/parameters/home_node
>> -1
>> /sys/module/null_blk/parameters/hw_queue_depth
>> 64
>> /sys/module/null_blk/parameters/irqmode
>> 1
>> /sys/module/null_blk/parameters/nr_devices
>> 2
>> /sys/module/null_blk/parameters/queue_mode
>> 2
>> /sys/module/null_blk/parameters/submit_queues
>> 24
>> /sys/module/null_blk/parameters/use_lightnvm
>> N
>> /sys/module/null_blk/parameters/use_per_node_hctx
>> N
>>
>> $ fio --group_reporting --rw=randread --bs=4k --numjobs=24
>> --iodepth=32 --runtime=99999999 --time_based --loops=1
>> --ioengine=libaio --direct=1 --invalidate=1 --randrepeat=1
>> --norandommap --exitall --name task_nullb0 --filename=/dev/nullb0
>> task_nullb0: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
>> ioengine=libaio, iodepth=32
>> ...
>> fio-2.1.10
>> Starting 24 processes
>> Jobs: 24 (f=24): [rrrrrrrrrrrrrrrrrrrrrrrr] [0.0% done]
>> [7234MB/0KB/0KB /s] [1852K/0/0 iops] [eta 1157d:09h:46m:22s]
>
> Your test above is prone to exhaust the dm-mpath blk-mq tags (128)
> because 24 threads * 32 easily exceeds 128 (by a factor of 6).
>
> I found that we were context switching (via bt_get's io_schedule)
> waiting for tags to become available.
>
> This is embarassing but, until Jens told me today, I was oblivious to
> the fact that the number of blk-mq's tags per hw_queue was defined by
> tag_set.queue_depth.
>
> Previously request-based DM's blk-mq support had:
> md->tag_set.queue_depth = BLKDEV_MAX_RQ; (again: 128)
>
> Now I have a patch that allows tuning queue_depth via dm_mod module
> parameter. And I'll likely bump the default to 4096 or something (doing
> so eliminated blocking in bt_get).
>
> But eliminating the tags bottleneck only raised my read IOPs from ~600K
> to ~800K (using 1 hw_queue for both null_blk and dm-mpath).
>
> When I raise nr_hw_queues to 4 for null_blk (keeping dm-mq at 1) I see a
> whole lot more context switching due to request-based DM's use of
> ksoftirqd (and kworkers) for request completion.
>
> So I'm moving on to optimizing the completion path. But at least some
> progress was made, more to come...
>
Would you mind sharing your patches?
We're currently doing tests with a high-performance FC setup
(16G FC with all-flash storage), and are still 20% short of the
announced backend performance.
Just as a side note: we're currently getting 550k IOPs.
With unpatched dm-mpath.
So nearly on par with your null-blk setup. but with real hardware.
(Which in itself is pretty cool. You should get faster RAM :-)
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare at suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)
next prev parent reply other threads:[~2016-01-30 8:52 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <569E11EA.8000305@dev.mellanox.co.il>
2016-01-19 22:45 ` dm-multipath low performance with blk-mq Mike Snitzer
2016-01-25 21:40 ` Mike Snitzer
2016-01-25 23:37 ` [dm-devel] " Benjamin Marzinski
2016-01-26 13:29 ` Mike Snitzer
[not found] ` <56A77C21.90605@suse.de>
2016-01-26 14:47 ` Mike Snitzer
2016-01-26 14:56 ` Christoph Hellwig
2016-01-26 15:27 ` Mike Snitzer
2016-01-27 11:14 ` Sagi Grimberg
2016-01-27 17:48 ` Mike Snitzer
2016-01-27 17:51 ` Jens Axboe
2016-01-27 18:16 ` Mike Snitzer
2016-01-27 18:26 ` Jens Axboe
2016-01-27 19:14 ` Mike Snitzer
2016-01-27 19:50 ` Jens Axboe
2016-01-27 17:56 ` Sagi Grimberg
2016-01-27 18:42 ` Mike Snitzer
2016-01-27 19:49 ` Jens Axboe
2016-01-27 20:45 ` Mike Snitzer
2016-01-29 23:35 ` Mike Snitzer
2016-01-30 8:52 ` Hannes Reinecke [this message]
2016-01-30 19:12 ` Mike Snitzer
2016-02-01 6:46 ` Hannes Reinecke
2016-02-03 18:04 ` Mike Snitzer
2016-02-03 18:24 ` Mike Snitzer
2016-02-03 19:22 ` Mike Snitzer
2016-02-04 6:54 ` Hannes Reinecke
2016-02-04 13:54 ` Mike Snitzer
2016-02-04 13:58 ` Hannes Reinecke
2016-02-04 14:09 ` Mike Snitzer
2016-02-04 14:32 ` Hannes Reinecke
2016-02-04 14:44 ` Mike Snitzer
2016-02-05 15:13 ` [RFC PATCH] dm: fix excessive dm-mq context switching Mike Snitzer
2016-02-05 18:05 ` Mike Snitzer
2016-02-05 19:19 ` Mike Snitzer
2016-02-07 15:41 ` Sagi Grimberg
2016-02-07 16:07 ` Mike Snitzer
2016-02-07 16:42 ` Sagi Grimberg
2016-02-07 16:37 ` Bart Van Assche
2016-02-07 16:43 ` Sagi Grimberg
2016-02-07 16:53 ` Mike Snitzer
2016-02-07 16:54 ` Sagi Grimberg
2016-02-07 17:20 ` Mike Snitzer
2016-02-08 12:21 ` Sagi Grimberg
2016-02-08 14:34 ` Mike Snitzer
2016-02-09 7:50 ` Hannes Reinecke
2016-02-09 14:55 ` Mike Snitzer
2016-02-09 15:32 ` Hannes Reinecke
2016-02-10 0:45 ` Mike Snitzer
[not found] ` <20160211015030.GA4481@redhat.com>
2016-02-11 3:35 ` RCU-ified dm-mpath for testing/review Mike Snitzer
2016-02-11 15:34 ` Mike Snitzer
2016-02-12 15:18 ` Hannes Reinecke
2016-02-12 15:26 ` Mike Snitzer
2016-02-12 16:04 ` Hannes Reinecke
2016-02-12 18:00 ` Mike Snitzer
2016-02-15 6:47 ` Hannes Reinecke
2016-01-26 1:49 ` [dm-devel] dm-multipath low performance with blk-mq Benjamin Marzinski
2016-01-26 16:03 ` Mike Snitzer
2016-01-26 16:44 ` Christoph Hellwig
2016-01-27 2:09 ` Mike Snitzer
2016-01-27 11:10 ` Sagi Grimberg
2016-01-26 21:40 ` [dm-devel] " Benjamin Marzinski
2016-01-18 12:04 Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56AC79D0.5060104@suse.de \
--to=hare@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).