From: Mike Snitzer <snitzer@redhat.com>
To: axboe@kernel.dk, Hannes Reinecke <hare@suse.de>,
Sagi Grimberg <sagig@dev.mellanox.co.il>,
Christoph Hellwig <hch@infradead.org>
Cc: "keith.busch@intel.com" <keith.busch@intel.com>,
linux-block@vger.kernel.org,
device-mapper development <dm-devel@redhat.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: [RFC PATCH] dm: fix excessive dm-mq context switching
Date: Fri, 5 Feb 2016 13:05:15 -0500 [thread overview]
Message-ID: <20160205180515.GA25808@redhat.com> (raw)
In-Reply-To: <20160205151334.GA82754@redhat.com>
On Fri, Feb 05 2016 at 10:13am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:
> Following is RFC because it really speaks to dm-mq _needing_ a variant
> of blk_mq_complete_request() that supports partial completions. Not
> supporting partial completions really isn't an option for DM multipath.
>
> From: Mike Snitzer <snitzer@redhat.com>
> Date: Fri, 5 Feb 2016 08:49:01 -0500
> Subject: [RFC PATCH] dm: fix excessive dm-mq context switching
>
> Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
> than if an underlying null_blk device were used directly. This biggest
> reason for this drop in performance is that blk_insert_clone_request()
> was calling blk_mq_insert_request() with @async=true. This forced the
> use of kblockd_schedule_delayed_work_on() to run the queues which
> ushered in ping-ponging between process context (fio in this case) and
> kblockd's kworker to submit the cloned request. The ftrace
> function_graph tracer showed:
>
> kworker-2013 => fio-12190
> fio-12190 => kworker-2013
> ...
> kworker-2013 => fio-12190
> fio-12190 => kworker-2013
> ...
>
> Fixing blk_mq_insert_request() to _not_ use kblockd to submit the cloned
> requests isn't enough to fix eliminated the oberved context switches.
>
> In addition to this dm-mq specific blk-core fix, there were 2 DM core
> fixes to dm-mq that (when paired with the blk-core fix) completely
> eliminate the observed context switching:
>
> 1) don't blk_mq_run_hw_queues in blk-mq request completion
>
> Motivated by desire to reduce overhead of dm-mq, punting to kblockd
> just increases context switches.
>
> In my testing against a really fast null_blk device there was no benefit
> to running blk_mq_run_hw_queues() on completion (and no other blk-mq
> driver does this). So hopefully this change doesn't induce the need for
> yet another revert like commit 621739b00e16ca2d !
>
> 2) use blk_mq_complete_request() in dm_complete_request()
>
> blk_complete_request() doesn't offer the traditional q->mq_ops vs
> .request_fn branching pattern that other historic block interfaces
> do (e.g. blk_get_request). Using blk_mq_complete_request() for
> blk-mq requests is important for performance but it doesn't handle
> partial completions -- which is a pretty big problem given the
> potential for partial completions with DM multipath due to path
> failure(s). As such this makes this entire patch only RFC-worthy.
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index c683f6d..a618477 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1344,7 +1340,10 @@ static void dm_complete_request(struct request *rq, int error)
> struct dm_rq_target_io *tio = tio_from_request(rq);
>
> tio->error = error;
> - blk_complete_request(rq);
> + if (!rq->q->mq_ops)
> + blk_complete_request(rq);
> + else
> + blk_mq_complete_request(rq, rq->errors);
> }
>
> /*
Looking closer, DM is very likely OK just using blk_mq_complete_request.
blk_complete_request() also doesn't provide native partial completion
support (it relies on the driver to do it, which DM core does):
/**
* blk_complete_request - end I/O on a request
* @req: the request being processed
*
* Description:
* Ends all I/O on a request. It does not handle partial completions,
* unless the driver actually implements this in its completion callback
* through requeueing. The actual completion happens out-of-order,
* through a softirq handler. The user must have registered a completion
* callback through blk_queue_softirq_done().
**/
blk_mq_complete_request() is effectively implemented in a comparable
fashion to blk_complete_request(). Given that DM core is providing
partial completion support by dm.c:end_clone_bio() triggering requeueing
of the request via dm-mpath.c:multipath_end_io()'s return of
DM_ENDIO_REQUEUE.
So I'm thinking I can drop the "RFC" for this patch and run with
it.. once I get Jens' feedback (hopefully) confirming my understanding.
Jens, please advise. If you're comfortable providing your Acked-by I
can get this fix in for 4.5-rc4 or so...
Thanks!
Mike
next prev parent reply other threads:[~2016-02-05 18:05 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <569CD4D6.2040908@dev.mellanox.co.il>
2016-01-19 10:37 ` dm-multipath low performance with blk-mq Sagi Grimberg
2016-01-19 22:45 ` Mike Snitzer
2016-01-25 21:40 ` Mike Snitzer
2016-01-25 23:37 ` Benjamin Marzinski
2016-01-26 13:29 ` Mike Snitzer
2016-01-26 14:01 ` Hannes Reinecke
2016-01-26 14:47 ` Mike Snitzer
2016-01-26 14:56 ` Christoph Hellwig
2016-01-26 15:27 ` Mike Snitzer
2016-01-26 15:57 ` Benjamin Marzinski
2016-01-27 11:14 ` Sagi Grimberg
2016-01-27 17:48 ` Mike Snitzer
2016-01-27 17:51 ` Jens Axboe
2016-01-27 18:16 ` Mike Snitzer
2016-01-27 18:26 ` Jens Axboe
2016-01-27 19:14 ` Mike Snitzer
2016-01-27 19:50 ` Jens Axboe
2016-01-27 17:56 ` Sagi Grimberg
2016-01-27 18:42 ` Mike Snitzer
2016-01-27 19:49 ` Jens Axboe
2016-01-27 20:45 ` Mike Snitzer
2016-01-29 23:35 ` Mike Snitzer
2016-01-30 8:52 ` Hannes Reinecke
2016-01-30 19:12 ` Mike Snitzer
2016-02-01 6:46 ` Hannes Reinecke
2016-02-03 18:04 ` Mike Snitzer
2016-02-03 18:24 ` Mike Snitzer
2016-02-03 19:22 ` Mike Snitzer
2016-02-04 6:54 ` Hannes Reinecke
2016-02-04 13:54 ` Mike Snitzer
2016-02-04 13:58 ` Hannes Reinecke
2016-02-04 14:09 ` Mike Snitzer
2016-02-04 14:32 ` Hannes Reinecke
2016-02-04 14:44 ` Mike Snitzer
2016-02-05 15:13 ` [RFC PATCH] dm: fix excessive dm-mq context switching Mike Snitzer
2016-02-05 18:05 ` Mike Snitzer [this message]
2016-02-05 19:19 ` Mike Snitzer
2016-02-07 15:41 ` Sagi Grimberg
2016-02-07 16:07 ` Mike Snitzer
2016-02-07 16:42 ` Sagi Grimberg
2016-02-07 16:37 ` Bart Van Assche
2016-02-07 16:43 ` Sagi Grimberg
2016-02-07 16:53 ` Mike Snitzer
2016-02-07 16:54 ` Sagi Grimberg
2016-02-07 17:20 ` Mike Snitzer
2016-02-08 12:21 ` Sagi Grimberg
2016-02-08 14:34 ` Mike Snitzer
2016-02-09 7:50 ` Hannes Reinecke
2016-02-09 14:55 ` Mike Snitzer
2016-02-09 15:32 ` Hannes Reinecke
2016-02-10 0:45 ` Mike Snitzer
2016-02-11 1:50 ` RCU-ified dm-mpath for testing/review Mike Snitzer
2016-02-11 3:35 ` Mike Snitzer
2016-02-11 15:34 ` Mike Snitzer
2016-02-12 15:18 ` Hannes Reinecke
2016-02-12 15:26 ` Mike Snitzer
2016-02-12 16:04 ` Hannes Reinecke
2016-02-12 18:00 ` Mike Snitzer
2016-02-15 6:47 ` Hannes Reinecke
2016-01-26 1:49 ` dm-multipath low performance with blk-mq Benjamin Marzinski
2016-01-26 16:03 ` Mike Snitzer
2016-01-26 16:44 ` Christoph Hellwig
2016-01-27 2:09 ` Mike Snitzer
2016-01-27 11:10 ` Sagi Grimberg
2016-01-26 21:40 ` Benjamin Marzinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160205180515.GA25808@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=dm-devel@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagig@dev.mellanox.co.il \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).