From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Bart Van Assche <bart.vanassche@sandisk.com>,
Laurence Oberman <loberman@redhat.com>,
Paolo Valente <paolo.valente@linaro.org>,
Oleksandr Natalenko <oleksandr@natalenko.name>,
Tom Nguyen <tom81094@gmail.com>,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
Omar Sandoval <osandov@fb.com>,
John Garry <john.garry@huawei.com>
Subject: Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
Date: Sat, 14 Oct 2017 00:07:37 +0800 [thread overview]
Message-ID: <20171013160731.GA30899@ming.t460p> (raw)
In-Reply-To: <6efdb459-8746-562d-06dc-5b3e172076e1@kernel.dk>
On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote:
> On 10/12/2017 06:19 PM, Ming Lei wrote:
> > On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote:
> >> On 10/12/2017 12:37 PM, Ming Lei wrote:
> >>> For SCSI devices, there is often per-request-queue depth, which need
> >>> to be respected before queuing one request.
> >>>
> >>> The current blk-mq always dequeues one request first, then calls .queue_rq()
> >>> to dispatch the request to lld. One obvious issue of this way is that I/O
> >>> merge may not be good, because when the per-request-queue depth can't be
> >>> respected, .queue_rq() has to return BLK_STS_RESOURCE, then this request
> >>> has to staty in hctx->dispatch list, and never got chance to participate
> >>> into I/O merge.
> >>>
> >>> This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
> >>> then we can try to get reserved budget first before dequeuing request.
> >>> Once we can't get budget for queueing I/O, we don't need to dequeue request
> >>> at all, then I/O merge can get improved a lot.
> >>
> >> I can't help but think that it would be cleaner to just be able to
> >> reinsert the request into the scheduler properly, if we fail to
> >> dispatch it. Bart hinted at that earlier as well.
> >
> > Actually when I start to investigate the issue, the 1st thing I tried
> > is to reinsert, but that way is even worse on qla2xxx.
> >
> > Once request is dequeued, the IO merge chance is decreased a lot.
> > With none scheduler, it becomes not possible to merge because
> > we only try to merge over the last 8 requests. With mq-deadline,
> > when one request is reinserted, another request may be dequeued
> > at the same time.
>
> I don't care too much about 'none'. If perfect merging is crucial for
> getting to the performance level you want on the hardware you are using,
> you should not be using 'none'. 'none' will work perfectly fine for NVMe
> etc style devices, where we are not dependent on merging to the same
> extent that we are on other devices.
>
> mq-deadline reinsertion will be expensive, that's in the nature of that
> beast. It's basically the same as a normal request inserition. So for
> that, we'd have to be a bit careful not to run into this too much. Even
> with a dumb approach, it should only happen 1 out of N times, where N is
> the typical point at which the device will return STS_RESOURCE. The
> reinsertion vs dequeue should be serialized with your patch to do that,
> at least for the single queue mq-deadline setup. In fact, I think your
> approach suffers from that same basic race, in that the budget isn't a
> hard allocation, it's just a hint. It can change from the time you check
> it, and when you go and dispatch the IO, if you don't serialize that
> part. So really should be no different in that regard.
In case of SCSI, the .get_buget is done as atomic counting,
and it is completely effective to avoid unnecessary dequeue, please take
a look at patch 6.
>
> > Not mention the cost of acquiring/releasing lock, that work
> > is just doing useless work and wasting CPU.
>
> Sure, my point is that if it doesn't happen too often, it doesn't really
> matter. It's not THAT expensive.
Actually it is in hot path, for example, lpfc and qla2xx's queue depth is 3,
it is quite easy to trigger STS_RESOURCE.
Thanks,
Ming
next prev parent reply other threads:[~2017-10-13 16:07 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 18:36 [PATCH V7 0/6] blk-mq-sched: improve sequential I/O performance Ming Lei
2017-10-12 18:36 ` [PATCH V7 1/6] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-10-12 18:37 ` [PATCH V7 2/6] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-10-12 18:37 ` [PATCH V7 3/6] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-10-12 18:37 ` [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops Ming Lei
2017-10-12 18:46 ` Jens Axboe
2017-10-13 0:19 ` Ming Lei
2017-10-13 14:44 ` Jens Axboe
2017-10-13 16:07 ` Ming Lei [this message]
2017-10-13 16:19 ` Jens Axboe
2017-10-13 16:21 ` Ming Lei
2017-10-13 16:28 ` Jens Axboe
2017-10-13 16:31 ` Bart Van Assche
2017-10-13 16:31 ` Bart Van Assche
2017-10-13 16:33 ` Jens Axboe
2017-10-13 16:45 ` Ming Lei
2017-10-13 17:08 ` Bart Van Assche
2017-10-13 17:08 ` Bart Van Assche
2017-10-13 17:29 ` Ming Lei
2017-10-13 17:47 ` Bart Van Assche
2017-10-13 17:47 ` Bart Van Assche
2017-10-16 11:30 ` Hannes Reinecke
2017-10-16 16:06 ` Bart Van Assche
2017-10-16 16:06 ` Bart Van Assche
2017-10-17 1:29 ` Ming Lei
2017-10-17 6:38 ` Hannes Reinecke
2017-10-17 9:36 ` Ming Lei
2017-10-17 18:09 ` Bart Van Assche
2017-10-13 16:17 ` Ming Lei
2017-10-13 16:20 ` Jens Axboe
2017-10-13 16:22 ` Ming Lei
2017-10-13 16:28 ` Jens Axboe
2017-10-12 18:37 ` [PATCH V7 5/6] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-10-12 18:48 ` Bart Van Assche
2017-10-13 0:20 ` Ming Lei
2017-10-12 18:37 ` [PATCH V7 6/6] SCSI: implement .get_budget and .put_budget for blk-mq Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013160731.GA30899@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=hch@infradead.org \
--cc=john.garry@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=oleksandr@natalenko.name \
--cc=osandov@fb.com \
--cc=paolo.valente@linaro.org \
--cc=tom81094@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.