From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@redhat.com>, Omar Sandoval <osandov@osandov.com>
Cc: linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Mike Snitzer <snitzer@redhat.com>,
dm-devel@redhat.com, Bart Van Assche <bart.vanassche@sandisk.com>,
Laurence Oberman <loberman@redhat.com>,
Paolo Valente <paolo.valente@linaro.org>,
Oleksandr Natalenko <oleksandr@natalenko.name>,
Tom Nguyen <tom81094@gmail.com>,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
Omar Sandoval <osandov@fb.com>
Subject: Re: [PATCH V6 4/5] blk-mq-sched: improve dispatching from sw queue
Date: Thu, 12 Oct 2017 08:52:12 -0600 [thread overview]
Message-ID: <82efff77-6b17-8bd8-e425-c143284ebb07@kernel.dk> (raw)
In-Reply-To: <20171012100107.GA28224@ming.t460p>
On 10/12/2017 04:01 AM, Ming Lei wrote:
> On Tue, Oct 10, 2017 at 11:23:45AM -0700, Omar Sandoval wrote:
>> On Mon, Oct 09, 2017 at 07:24:23PM +0800, Ming Lei wrote:
>>> SCSI devices use host-wide tagset, and the shared driver tag space is
>>> often quite big. Meantime there is also queue depth for each lun(
>>> .cmd_per_lun), which is often small, for example, on both lpfc and
>>> qla2xxx, .cmd_per_lun is just 3.
>>>
>>> So lots of requests may stay in sw queue, and we always flush all
>>> belonging to same hw queue and dispatch them all to driver, unfortunately
>>> it is easy to cause queue busy because of the small .cmd_per_lun.
>>> Once these requests are flushed out, they have to stay in hctx->dispatch,
>>> and no bio merge can participate into these requests, and sequential IO
>>> performance is hurt a lot.
>>>
>>> This patch introduces blk_mq_dequeue_from_ctx for dequeuing request from
>>> sw queue so that we can dispatch them in scheduler's way, then we can
>>> avoid to dequeue too many requests from sw queue when ->dispatch isn't
>>> flushed completely.
>>>
>>> This patch improves dispatching from sw queue when there is per-request-queue
>>> queue depth by taking request one by one from sw queue, just like the way
>>> of IO scheduler.
>>
>> This still didn't address Jens' concern about using q->queue_depth as
>> the heuristic for whether to do the full sw queue flush or one-by-one
>> dispatch. The EWMA approach is a bit too complex for now, can you please
>> try the heuristic of whether the driver ever returned BLK_STS_RESOURCE?
>
> That can be done easily, but I am not sure if it is good.
>
> For example, inside queue rq path of NVMe, kmalloc(GFP_ATOMIC) is
> often used, if kmalloc() returns NULL just once, BLK_STS_RESOURCE
> will be returned to blk-mq, then blk-mq will never do full sw
> queue flush even when kmalloc() always succeed from that time
> on.
Have it be a bit more than a single bit, then. Reset it every x IOs or
something like that, that'll be more representative of transient busy
conditions anyway.
> Even EWMA approach isn't good on SCSI-MQ too, because
> some SCSI's .cmd_per_lun is very small, such as 3 on
> lpfc and qla2xxx, and one full flush will trigger
> BLK_STS_RESOURCE easily.
>
> So I suggest to use the way of q->queue_depth first, since we
> don't get performance degrade report on other devices(!q->queue_depth)
> with blk-mq. We can improve this way in the future if we
> have better approach.
>
> What do you think about it?
I think it's absolutely horrible, and I already explained why in great
detail in an earlier review. tldr is that the fact that only scsi sets
->queue_depth right now is completely randomly related to the fact that
only scsi also shares tags. Every driver should set the queue depth, so
that signal will go to zero very quickly.
--
Jens Axboe
next prev parent reply other threads:[~2017-10-12 14:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-09 11:24 [PATCH V6 0/5] blk-mq-sched: improve sequential I/O performance Ming Lei
2017-10-09 11:24 ` [PATCH V6 1/5] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-10-10 18:10 ` Omar Sandoval
2017-10-09 11:24 ` [PATCH V6 2/5] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-10-09 11:24 ` [PATCH V6 3/5] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-10-10 18:15 ` Omar Sandoval
2017-10-09 11:24 ` [PATCH V6 4/5] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-10-10 18:23 ` Omar Sandoval
2017-10-12 10:01 ` Ming Lei
2017-10-12 14:52 ` Jens Axboe [this message]
2017-10-12 15:22 ` Ming Lei
2017-10-12 15:24 ` Jens Axboe
2017-10-12 15:33 ` Bart Van Assche
2017-10-12 15:37 ` Jens Axboe
2017-10-12 15:49 ` Ming Lei
2017-10-09 11:24 ` [PATCH V6 5/5] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-10-10 18:26 ` Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82efff77-6b17-8bd8-e425-c143284ebb07@kernel.dk \
--to=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=ming.lei@redhat.com \
--cc=oleksandr@natalenko.name \
--cc=osandov@fb.com \
--cc=osandov@osandov.com \
--cc=paolo.valente@linaro.org \
--cc=snitzer@redhat.com \
--cc=tom81094@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox