From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Bart Van Assche <bart.vanassche@sandisk.com>,
Laurence Oberman <loberman@redhat.com>,
Paolo Valente <paolo.valente@linaro.org>,
Oleksandr Natalenko <oleksandr@natalenko.name>,
Tom Nguyen <tom81094@gmail.com>,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
Omar Sandoval <osandov@fb.com>,
John Garry <john.garry@huawei.com>
Subject: Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
Date: Sat, 14 Oct 2017 00:21:12 +0800 [thread overview]
Message-ID: <20171013162111.GC30899@ming.t460p> (raw)
In-Reply-To: <845bd050-8566-8749-d73f-9a3731c7736f@kernel.dk>
On Fri, Oct 13, 2017 at 10:19:04AM -0600, Jens Axboe wrote:
> On 10/13/2017 10:07 AM, Ming Lei wrote:
> > On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote:
> >> On 10/12/2017 06:19 PM, Ming Lei wrote:
> >>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote:
> >>>> On 10/12/2017 12:37 PM, Ming Lei wrote:
> >>>>> For SCSI devices, there is often per-request-queue depth, which need
> >>>>> to be respected before queuing one request.
> >>>>>
> >>>>> The current blk-mq always dequeues one request first, then calls .queue_rq()
> >>>>> to dispatch the request to lld. One obvious issue of this way is that I/O
> >>>>> merge may not be good, because when the per-request-queue depth can't be
> >>>>> respected, .queue_rq() has to return BLK_STS_RESOURCE, then this request
> >>>>> has to staty in hctx->dispatch list, and never got chance to participate
> >>>>> into I/O merge.
> >>>>>
> >>>>> This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
> >>>>> then we can try to get reserved budget first before dequeuing request.
> >>>>> Once we can't get budget for queueing I/O, we don't need to dequeue request
> >>>>> at all, then I/O merge can get improved a lot.
> >>>>
> >>>> I can't help but think that it would be cleaner to just be able to
> >>>> reinsert the request into the scheduler properly, if we fail to
> >>>> dispatch it. Bart hinted at that earlier as well.
> >>>
> >>> Actually when I start to investigate the issue, the 1st thing I tried
> >>> is to reinsert, but that way is even worse on qla2xxx.
> >>>
> >>> Once request is dequeued, the IO merge chance is decreased a lot.
> >>> With none scheduler, it becomes not possible to merge because
> >>> we only try to merge over the last 8 requests. With mq-deadline,
> >>> when one request is reinserted, another request may be dequeued
> >>> at the same time.
> >>
> >> I don't care too much about 'none'. If perfect merging is crucial for
> >> getting to the performance level you want on the hardware you are using,
> >> you should not be using 'none'. 'none' will work perfectly fine for NVMe
> >> etc style devices, where we are not dependent on merging to the same
> >> extent that we are on other devices.
> >>
> >> mq-deadline reinsertion will be expensive, that's in the nature of that
> >> beast. It's basically the same as a normal request inserition. So for
> >> that, we'd have to be a bit careful not to run into this too much. Even
> >> with a dumb approach, it should only happen 1 out of N times, where N is
> >> the typical point at which the device will return STS_RESOURCE. The
> >> reinsertion vs dequeue should be serialized with your patch to do that,
> >> at least for the single queue mq-deadline setup. In fact, I think your
> >> approach suffers from that same basic race, in that the budget isn't a
> >> hard allocation, it's just a hint. It can change from the time you check
> >> it, and when you go and dispatch the IO, if you don't serialize that
> >> part. So really should be no different in that regard.
> >
> > In case of SCSI, the .get_buget is done as atomic counting,
> > and it is completely effective to avoid unnecessary dequeue, please take
> > a look at patch 6.
>
> Looks like you are right, I had initially misread that as just checking
> the busy count. But you are actually getting the count at that point,
> so it should be solid.
>
> >>> Not mention the cost of acquiring/releasing lock, that work
> >>> is just doing useless work and wasting CPU.
> >>
> >> Sure, my point is that if it doesn't happen too often, it doesn't really
> >> matter. It's not THAT expensive.
> >
> > Actually it is in hot path, for example, lpfc and qla2xx's queue depth is 3,
> > it is quite easy to trigger STS_RESOURCE.
>
> Ugh, that is low.
>
> OK, I think we should just roll with this and see how far we can go. I'll
> apply it for 4.15.
OK, I have some update, will post a new version soon.
Thanks
Ming
next prev parent reply other threads:[~2017-10-13 16:21 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 18:36 [PATCH V7 0/6] blk-mq-sched: improve sequential I/O performance Ming Lei
2017-10-12 18:36 ` [PATCH V7 1/6] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-10-12 18:37 ` [PATCH V7 2/6] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-10-12 18:37 ` [PATCH V7 3/6] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-10-12 18:37 ` [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops Ming Lei
2017-10-12 18:46 ` Jens Axboe
2017-10-13 0:19 ` Ming Lei
2017-10-13 14:44 ` Jens Axboe
2017-10-13 16:07 ` Ming Lei
2017-10-13 16:19 ` Jens Axboe
2017-10-13 16:21 ` Ming Lei [this message]
2017-10-13 16:28 ` Jens Axboe
2017-10-13 16:31 ` Bart Van Assche
2017-10-13 16:33 ` Jens Axboe
2017-10-13 16:45 ` Ming Lei
2017-10-13 17:08 ` Bart Van Assche
2017-10-13 17:29 ` Ming Lei
2017-10-13 17:47 ` Bart Van Assche
2017-10-16 11:30 ` Hannes Reinecke
2017-10-16 16:06 ` Bart Van Assche
2017-10-17 1:29 ` Ming Lei
2017-10-17 6:38 ` Hannes Reinecke
2017-10-17 9:36 ` Ming Lei
2017-10-17 18:09 ` Bart Van Assche
2017-10-13 16:17 ` Ming Lei
2017-10-13 16:20 ` Jens Axboe
2017-10-13 16:22 ` Ming Lei
2017-10-13 16:28 ` Jens Axboe
2017-10-12 18:37 ` [PATCH V7 5/6] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-10-12 18:48 ` Bart Van Assche
2017-10-13 0:20 ` Ming Lei
2017-10-12 18:37 ` [PATCH V7 6/6] SCSI: implement .get_budget and .put_budget for blk-mq Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013162111.GC30899@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=hch@infradead.org \
--cc=john.garry@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=oleksandr@natalenko.name \
--cc=osandov@fb.com \
--cc=paolo.valente@linaro.org \
--cc=tom81094@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).