From: Mike Snitzer <snitzer@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: axboe@kernel.dk, Christoph Hellwig <hch@infradead.org>,
Sagi Grimberg <sagig@dev.mellanox.co.il>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"keith.busch@intel.com" <keith.busch@intel.com>,
device-mapper development <dm-devel@redhat.com>,
linux-block@vger.kernel.org,
Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: dm-multipath low performance with blk-mq
Date: Sat, 30 Jan 2016 14:12:38 -0500 [thread overview]
Message-ID: <20160130191238.GA18686@redhat.com> (raw)
In-Reply-To: <56AC79D0.5060104@suse.de>
On Sat, Jan 30 2016 at 3:52am -0500,
Hannes Reinecke <hare@suse.de> wrote:
> On 01/30/2016 12:35 AM, Mike Snitzer wrote:
> >
> >Your test above is prone to exhaust the dm-mpath blk-mq tags (128)
> >because 24 threads * 32 easily exceeds 128 (by a factor of 6).
> >
> >I found that we were context switching (via bt_get's io_schedule)
> >waiting for tags to become available.
> >
> >This is embarassing but, until Jens told me today, I was oblivious to
> >the fact that the number of blk-mq's tags per hw_queue was defined by
> >tag_set.queue_depth.
> >
> >Previously request-based DM's blk-mq support had:
> >md->tag_set.queue_depth = BLKDEV_MAX_RQ; (again: 128)
> >
> >Now I have a patch that allows tuning queue_depth via dm_mod module
> >parameter. And I'll likely bump the default to 4096 or something (doing
> >so eliminated blocking in bt_get).
> >
> >But eliminating the tags bottleneck only raised my read IOPs from ~600K
> >to ~800K (using 1 hw_queue for both null_blk and dm-mpath).
> >
> >When I raise nr_hw_queues to 4 for null_blk (keeping dm-mq at 1) I see a
> >whole lot more context switching due to request-based DM's use of
> >ksoftirqd (and kworkers) for request completion.
> >
> >So I'm moving on to optimizing the completion path. But at least some
> >progress was made, more to come...
> >
>
> Would you mind sharing your patches?
I'm still working through this. I'll hopefully have a handful of
RFC-level changes by end of day Monday. But could take longer.
One change that I already shared in a previous mail is:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel2&id=99ebcaf36d9d1fa3acec98492c36664d57ba8fbd
> We're currently doing tests with a high-performance FC setup
> (16G FC with all-flash storage), and are still 20% short of the
> announced backend performance.
>
> Just as a side note: we're currently getting 550k IOPs.
> With unpatched dm-mpath.
What is your test workload? If you can share I'll be sure to factor it
into my testing.
> So nearly on par with your null-blk setup. but with real hardware.
> (Which in itself is pretty cool. You should get faster RAM :-)
You've misunderstood what I said my null_blk (RAM) performance is.
My null_blk test gets ~1900K read IOPs. But dm-mpath ontop only gets
between 600K and 1000K IOPs depending on $FIO_QUEUE_DEPTH and if I
use multiple $NULL_BLK_HW_QUEUES.
Here is the script I've been using to test:
#!/bin/sh
set -xv
NULL_BLK_HW_QUEUES=1
NULL_BLK_QUEUE_DEPTH=4096
DM_MQ_HW_QUEUES=1
DM_MQ_QUEUE_DEPTH=4096
FIO=/root/snitm/git/fio/fio
FIO_QUEUE_DEPTH=32
FIO_RUNTIME=10
FIO_NUMJOBS=12
PERF=perf
#PERF=/root/snitm/git/linux/tools/perf/perf
run_fio() {
DEVICE=$1
TASK_NAME=$(basename ${DEVICE})
PERF_RECORD=$2
RUN_CMD="${FIO} --cpus_allowed_policy=split --group_reporting --rw=randread --bs=4k --numjobs=${FIO_NUMJOBS} \
--iodepth=${FIO_QUEUE_DEPTH} --runtime=${FIO_RUNTIME} --time_based --loops=1 --ioengine=libaio \
--direct=1 --invalidate=1 --randrepeat=1 --norandommap --exitall --name task_${TASK_NAME} --filename=${DEVICE}"
if [ ! -z "${PERF_RECORD}" ]; then
${PERF_RECORD} ${RUN_CMD}
mv perf.data perf.data.${TASK_NAME}
else
${RUN_CMD}
fi
}
dmsetup remove dm_mq
modprobe -r null_blk
modprobe null_blk gb=4 bs=512 hw_queue_depth=${NULL_BLK_QUEUE_DEPTH} nr_devices=1 queue_mode=2 irqmode=1 completion_nsec=1 submit_queues=${NULL_BLK_HW_QUEUES}
run_fio /dev/nullb0
run_fio /dev/nullb0 "${PERF} record -ag -e cs"
echo Y > /sys/module/dm_mod/parameters/use_blk_mq
echo ${DM_MQ_QUEUE_DEPTH} > /sys/module/dm_mod/parameters/blk_mq_queue_depth
echo ${DM_MQ_HW_QUEUES} > /sys/module/dm_mod/parameters/blk_mq_hw_queues
echo "0 8388608 multipath 0 0 1 1 service-time 0 1 2 /dev/nullb0 1000 1" | dmsetup create dm_mq
run_fio /dev/mapper/dm_mq
run_fio /dev/mapper/dm_mq "${PERF} record -ag -e cs"
next prev parent reply other threads:[~2016-01-30 19:12 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <569CD4D6.2040908@dev.mellanox.co.il>
2016-01-19 10:37 ` dm-multipath low performance with blk-mq Sagi Grimberg
2016-01-19 22:45 ` Mike Snitzer
2016-01-25 21:40 ` Mike Snitzer
2016-01-25 23:37 ` Benjamin Marzinski
2016-01-26 13:29 ` Mike Snitzer
2016-01-26 14:01 ` Hannes Reinecke
2016-01-26 14:47 ` Mike Snitzer
2016-01-26 14:56 ` Christoph Hellwig
2016-01-26 15:27 ` Mike Snitzer
2016-01-26 15:57 ` Benjamin Marzinski
2016-01-27 11:14 ` Sagi Grimberg
2016-01-27 17:48 ` Mike Snitzer
2016-01-27 17:51 ` Jens Axboe
2016-01-27 18:16 ` Mike Snitzer
2016-01-27 18:26 ` Jens Axboe
2016-01-27 19:14 ` Mike Snitzer
2016-01-27 19:50 ` Jens Axboe
2016-01-27 17:56 ` Sagi Grimberg
2016-01-27 18:42 ` Mike Snitzer
2016-01-27 19:49 ` Jens Axboe
2016-01-27 20:45 ` Mike Snitzer
2016-01-29 23:35 ` Mike Snitzer
2016-01-30 8:52 ` Hannes Reinecke
2016-01-30 19:12 ` Mike Snitzer [this message]
2016-02-01 6:46 ` Hannes Reinecke
2016-02-03 18:04 ` Mike Snitzer
2016-02-03 18:24 ` Mike Snitzer
2016-02-03 19:22 ` Mike Snitzer
2016-02-04 6:54 ` Hannes Reinecke
2016-02-04 13:54 ` Mike Snitzer
2016-02-04 13:58 ` Hannes Reinecke
2016-02-04 14:09 ` Mike Snitzer
2016-02-04 14:32 ` Hannes Reinecke
2016-02-04 14:44 ` Mike Snitzer
2016-02-05 15:13 ` [RFC PATCH] dm: fix excessive dm-mq context switching Mike Snitzer
2016-02-05 18:05 ` Mike Snitzer
2016-02-05 19:19 ` Mike Snitzer
2016-02-07 15:41 ` Sagi Grimberg
2016-02-07 16:07 ` Mike Snitzer
2016-02-07 16:42 ` Sagi Grimberg
2016-02-07 16:37 ` Bart Van Assche
2016-02-07 16:43 ` Sagi Grimberg
2016-02-07 16:53 ` Mike Snitzer
2016-02-07 16:54 ` Sagi Grimberg
2016-02-07 17:20 ` Mike Snitzer
2016-02-08 12:21 ` Sagi Grimberg
2016-02-08 14:34 ` Mike Snitzer
2016-02-09 7:50 ` Hannes Reinecke
2016-02-09 14:55 ` Mike Snitzer
2016-02-09 15:32 ` Hannes Reinecke
2016-02-10 0:45 ` Mike Snitzer
2016-02-11 1:50 ` RCU-ified dm-mpath for testing/review Mike Snitzer
2016-02-11 3:35 ` Mike Snitzer
2016-02-11 15:34 ` Mike Snitzer
2016-02-12 15:18 ` Hannes Reinecke
2016-02-12 15:26 ` Mike Snitzer
2016-02-12 16:04 ` Hannes Reinecke
2016-02-12 18:00 ` Mike Snitzer
2016-02-15 6:47 ` Hannes Reinecke
2016-01-26 1:49 ` dm-multipath low performance with blk-mq Benjamin Marzinski
2016-01-26 16:03 ` Mike Snitzer
2016-01-26 16:44 ` Christoph Hellwig
2016-01-27 2:09 ` Mike Snitzer
2016-01-27 11:10 ` Sagi Grimberg
2016-01-26 21:40 ` Benjamin Marzinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160130191238.GA18686@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=dm-devel@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagig@dev.mellanox.co.il \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).