linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@fb.com>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Mike Snitzer <snitzer@redhat.com>,
	dm-devel@redhat.com
Cc: Bart Van Assche <bart.vanassche@sandisk.com>,
	Laurence Oberman <loberman@redhat.com>,
	Paolo Valente <paolo.valente@linaro.org>,
	Oleksandr Natalenko <oleksandr@natalenko.name>,
	Tom Nguyen <tom81094@gmail.com>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	Omar Sandoval <osandov@fb.com>, Ming Lei <ming.lei@redhat.com>
Subject: [PATCH V5 6/7] blk-mq-sched: improve dispatching from sw queue
Date: Sat, 30 Sep 2017 18:27:19 +0800	[thread overview]
Message-ID: <20170930102720.30219-7-ming.lei@redhat.com> (raw)
In-Reply-To: <20170930102720.30219-1-ming.lei@redhat.com>

SCSI devices use host-wide tagset, and the shared
driver tag space is often quite big. Meantime
there is also queue depth for each lun(.cmd_per_lun),
which is often small.

So lots of requests may stay in sw queue, and we
always flush all belonging to same hw queue and
dispatch them all to driver, unfortunately it is
easy to cause queue busy because of the small
per-lun queue depth. Once these requests are flushed
out, they have to stay in hctx->dispatch, and no bio
merge can participate into these requests, and
sequential IO performance is hurted.

This patch improves dispatching from sw queue when
there is per-request-queue queue depth by taking
request one by one from sw queue, just like the way
of IO scheduler.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Tom Nguyen <tom81094@gmail.com>
Tested-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c   | 53 ++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/blk-mq.h |  2 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 538f363f39ca..3ba112d9dc15 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -105,6 +105,42 @@ static void blk_mq_do_dispatch_sched(struct request_queue *q,
 	} while (blk_mq_dispatch_rq_list(q, &rq_list));
 }
 
+static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
+					  struct blk_mq_ctx *ctx)
+{
+	unsigned idx = ctx->index_hw;
+
+	if (++idx == hctx->nr_ctx)
+		idx = 0;
+
+	return hctx->ctxs[idx];
+}
+
+static void blk_mq_do_dispatch_ctx(struct request_queue *q,
+				   struct blk_mq_hw_ctx *hctx)
+{
+	LIST_HEAD(rq_list);
+	struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+	bool dispatched;
+
+	do {
+		struct request *rq;
+
+		rq = blk_mq_dequeue_from_ctx(hctx, ctx);
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+
+		/* round robin for fair dispatch */
+		ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
+
+		dispatched = blk_mq_dispatch_rq_list(q, &rq_list);
+	} while (dispatched);
+
+	if (!dispatched)
+		WRITE_ONCE(hctx->dispatch_from, ctx);
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
@@ -142,18 +178,31 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch) {
+	} else if (!has_sched_dispatch && !q->queue_depth) {
+		/*
+		 * If there is no per-request_queue depth, we
+		 * flush all requests in this hw queue, otherwise
+		 * pick up request one by one from sw queue for
+		 * avoiding to mess up I/O merge when dispatch
+		 * run out of resource, which can be triggered
+		 * easily by per-request_queue queue depth
+		 */
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
 	}
 
+	if (!do_sched_dispatch)
+		return;
+
 	/*
 	 * We want to dispatch from the scheduler if there was nothing
 	 * on the dispatch list or we were able to dispatch from the
 	 * dispatch list.
 	 */
-	if (do_sched_dispatch && has_sched_dispatch)
+	if (has_sched_dispatch)
 		blk_mq_do_dispatch_sched(q, e, hctx);
+	else
+		blk_mq_do_dispatch_ctx(q, hctx);
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2747469cedaf..fccabe00fb55 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -30,6 +30,8 @@ struct blk_mq_hw_ctx {
 
 	struct sbitmap		ctx_map;
 
+	struct blk_mq_ctx	*dispatch_from;
+
 	struct blk_mq_ctx	**ctxs;
 	unsigned int		nr_ctx;
 
-- 
2.9.5

  parent reply	other threads:[~2017-09-30 10:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-30 10:27 [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1) Ming Lei
2017-09-30 10:27 ` [PATCH V5 1/7] blk-mq: issue rq directly in blk_mq_request_bypass_insert() Ming Lei
2017-10-03  8:58   ` Christoph Hellwig
2017-10-03 13:39     ` Ming Lei
2017-09-30 10:27 ` [PATCH V5 2/7] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-10-02 14:19   ` Christoph Hellwig
2017-09-30 10:27 ` [PATCH V5 3/7] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-09-30 10:27 ` [PATCH V5 4/7] blk-mq: introduce blk_mq_dequeue_from_ctx() Ming Lei
2017-10-03  9:01   ` Christoph Hellwig
2017-10-09  4:36     ` Ming Lei
2017-09-30 10:27 ` [PATCH V5 5/7] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-10-02 14:19   ` Christoph Hellwig
2017-10-09  9:07     ` Ming Lei
2017-09-30 10:27 ` Ming Lei [this message]
2017-10-03  9:05   ` [PATCH V5 6/7] blk-mq-sched: improve dispatching from sw queue Christoph Hellwig
2017-10-09 10:15     ` Ming Lei
2017-09-30 10:27 ` [PATCH V5 7/7] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-10-03  9:11   ` Christoph Hellwig
2017-10-09 10:40     ` Ming Lei
2017-09-30 10:32 ` [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1) Ming Lei
2017-10-09 12:09 ` John Garry
2017-10-09 15:04   ` Ming Lei
2017-10-10  1:46     ` Ming Lei
2017-10-10 12:24       ` John Garry
2017-10-10 12:34         ` Johannes Thumshirn
2017-10-10 12:37           ` Paolo Valente
2017-10-10 13:45         ` Ming Lei
2017-10-10 15:10           ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170930102720.30219-7-ming.lei@redhat.com \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=osandov@fb.com \
    --cc=paolo.valente@linaro.org \
    --cc=snitzer@redhat.com \
    --cc=tom81094@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).