Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@fb.com" <axboe@fb.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue
Date: Tue, 1 Aug 2017 18:50:15 +0800	[thread overview]
Message-ID: <20170801105013.GD31452@ming.t460p> (raw)
In-Reply-To: <20170801101718.GB31452@ming.t460p>

On Tue, Aug 01, 2017 at 06:17:18PM +0800, Ming Lei wrote:
> On Mon, Jul 31, 2017 at 11:34:35PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-08-01 at 00:51 +0800, Ming Lei wrote:
> > > SCSI devices use host-wide tagset, and the shared
> > > driver tag space is often quite big. Meantime
> > > there is also queue depth for each lun(.cmd_per_lun),
> > > which is often small.
> > > 
> > > So lots of requests may stay in sw queue, and we
> > > always flush all belonging to same hw queue and
> > > dispatch them all to driver, unfortunately it is
> > > easy to cause queue busy becasue of the small
> > > per-lun queue depth. Once these requests are flushed
> > > out, they have to stay in hctx->dispatch, and no bio
> > > merge can participate into these requests, and
> > > sequential IO performance is hurted.
> > > 
> > > This patch improves dispatching from sw queue when
> > > there is per-request-queue queue depth by taking
> > > request one by one from sw queue, just like the way
> > > of IO scheduler.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  block/blk-mq-sched.c | 25 +++++++++++++++----------
> > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index 47a25333a136..3510c01cb17b 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -96,6 +96,9 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > >  	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> > >  	bool can_go = true;
> > >  	LIST_HEAD(rq_list);
> > > +	struct request *(*dispatch_fn)(struct blk_mq_hw_ctx *) =
> > > +		has_sched_dispatch ? e->type->ops.mq.dispatch_request :
> > > +			blk_mq_dispatch_rq_from_ctxs;
> > >  
> > >  	/* RCU or SRCU read lock is needed before checking quiesced flag */
> > >  	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > @@ -126,26 +129,28 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > >  	if (!list_empty(&rq_list)) {
> > >  		blk_mq_sched_mark_restart_hctx(hctx);
> > >  		can_go = blk_mq_dispatch_rq_list(q, &rq_list);
> > > -	} else if (!has_sched_dispatch) {
> > > +	} else if (!has_sched_dispatch && !q->queue_depth) {
> > >  		blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > >  		blk_mq_dispatch_rq_list(q, &rq_list);
> > > +		can_go = false;
> > >  	}
> > >  
> > > +	if (!can_go)
> > > +		return;
> > > +
> > >  	/*
> > >  	 * We want to dispatch from the scheduler if we had no work left
> > >  	 * on the dispatch list, OR if we did have work but weren't able
> > >  	 * to make progress.
> > >  	 */
> > > -	if (can_go && has_sched_dispatch) {
> > > -		do {
> > > -			struct request *rq;
> > > +	do {
> > > +		struct request *rq;
> > >  
> > > -			rq = e->type->ops.mq.dispatch_request(hctx);
> > > -			if (!rq)
> > > -				break;
> > > -			list_add(&rq->queuelist, &rq_list);
> > > -		} while (blk_mq_dispatch_rq_list(q, &rq_list));
> > > -	}
> > > +		rq = dispatch_fn(hctx);
> > > +		if (!rq)
> > > +			break;
> > > +		list_add(&rq->queuelist, &rq_list);
> > > +	} while (blk_mq_dispatch_rq_list(q, &rq_list));
> > >  }
> > 
> > Hello Ming,
> > 
> > Although I like the idea behind this patch, I'm afraid that this patch will
> > cause a performance regression for high-performance SCSI LLD drivers, e.g.
> > ib_srp. Have you considered to rework this patch as follows:
> > * Remove the code under "else if (!has_sched_dispatch && !q->queue_depth) {".
> 
> This will affect devices such as NVMe in which busy isn't triggered
> basically, so better to not do this.
> 
> > * Modify all blk_mq_dispatch_rq_list() functions such that these dispatch up
> >   to cmd_per_lun - (number of requests in progress) at once.
> 
> How can we get the accurate 'number of requests in progress' efficiently?
> 
> And we have done it in this way for blk-mq scheduler already, so it
> shouldn't be a problem.
> 
> From my test data of mq-deadline on lpfc, the performance is good,
> please see it in cover letter.

Forget to mention, ctx->list is one per-cpu list and the lock is percpu
lock, so changing to this way shouldn't be a performance issue.

-- 
Ming

next prev parent reply	other threads:[~2017-08-01 10:50 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31 16:50 [PATCH 00/14] blk-mq-sched: fix SCSI-MQ performance regression Ming Lei
2017-07-31 16:50 ` Ming Lei
2017-07-31 16:50 ` [PATCH 01/14] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-07-31 23:00   ` Bart Van Assche
2017-07-31 23:00     ` Bart Van Assche
2017-07-31 16:50 ` [PATCH 02/14] blk-mq: rename flush_busy_ctx_data as ctx_iter_data Ming Lei
2017-07-31 23:03   ` Bart Van Assche
2017-07-31 23:03     ` Bart Van Assche
2017-07-31 16:51 ` [PATCH 03/14] blk-mq: introduce blk_mq_dispatch_rq_from_ctxs() Ming Lei
2017-07-31 23:09   ` Bart Van Assche
2017-07-31 23:09     ` Bart Van Assche
2017-08-01 10:07     ` Ming Lei
2017-08-02 17:19   ` kbuild test robot
2017-08-02 17:19     ` kbuild test robot
2017-07-31 16:51 ` [PATCH 04/14] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-07-31 23:34   ` Bart Van Assche
2017-07-31 23:34     ` Bart Van Assche
2017-08-01 10:17     ` Ming Lei
2017-08-01 10:50       ` Ming Lei [this message]
2017-08-01 15:11         ` Bart Van Assche
2017-08-01 15:11           ` Bart Van Assche
2017-08-02  3:31           ` Ming Lei
2017-08-03  1:35             ` Bart Van Assche
2017-08-03  1:35               ` Bart Van Assche
2017-08-03  3:13               ` Ming Lei
2017-08-03 17:33                 ` Bart Van Assche
2017-08-03 17:33                   ` Bart Van Assche
2017-08-05  8:40                   ` hch
2017-08-05 13:40                   ` Ming Lei
2017-07-31 16:51 ` [PATCH 05/14] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-07-31 23:42   ` Bart Van Assche
2017-07-31 23:42     ` Bart Van Assche
2017-08-01 10:44     ` Ming Lei
2017-08-01 16:14       ` Bart Van Assche
2017-08-01 16:14         ` Bart Van Assche
2017-08-02  3:01         ` Ming Lei
2017-08-03  1:33           ` Bart Van Assche
2017-08-03  1:33             ` Bart Van Assche
2017-07-31 16:51 ` [PATCH 06/14] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-07-31 16:51 ` [PATCH 07/14] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-07-31 16:51 ` [PATCH 08/14] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-07-31 16:51 ` [PATCH 09/14] blk-mq-sched: cleanup blk_mq_sched_dispatch_requests() Ming Lei
2017-07-31 16:51 ` [PATCH 10/14] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-07-31 16:51 ` [PATCH 11/14] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-07-31 16:51 ` [PATCH 12/14] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-07-31 16:51 ` [PATCH 13/14] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-07-31 16:51 ` [PATCH 14/14] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801105013.GD31452@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@fb.com \
    --cc=hch@infradead.org \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.