From: Ming Lei <ming.lei@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
linux-scsi@vger.kernel.org,
Laurence Oberman <loberman@redhat.com>,
Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: blk-mq: introduce BLK_STS_DEV_RESOURCE
Date: Sun, 21 Jan 2018 09:32:54 +0800 [thread overview]
Message-ID: <20180121013248.GA13715@ming.t460p> (raw)
In-Reply-To: <20180121005741.GB9684@ming.t460p>
On Sun, Jan 21, 2018 at 08:57:41AM +0800, Ming Lei wrote:
> On Sat, Jan 20, 2018 at 12:30:02PM -0500, Mike Snitzer wrote:
> > On Sat, Jan 20 2018 at 8:48am -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> >
> > > This status is returned from driver to block layer if device related
> > > resource is run out of, but driver can guarantee that IO dispatch is
> > > triggered in future when the resource is available.
> > >
> > > This patch converts some drivers to use this return value. Meantime
> > > if driver returns BLK_STS_RESOURCE and S_SCHED_RESTART is marked, run
> > > queue after 10ms for avoiding IO hang.
> > >
> > > Suggested-by: Jens Axboe <axboe@kernel.dk>
> > > Cc: Mike Snitzer <snitzer@redhat.com>
> > > Cc: Laurence Oberman <loberman@redhat.com>
> > > Cc: Bart Van Assche <bart.vanassche@sandisk.com>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > block/blk-core.c | 1 +
> > > block/blk-mq.c | 20 ++++++++++++++++----
> > > drivers/block/null_blk.c | 2 +-
> > > drivers/block/virtio_blk.c | 2 +-
> > > drivers/block/xen-blkfront.c | 2 +-
> > > drivers/scsi/scsi_lib.c | 6 +++---
> > > include/linux/blk_types.h | 7 +++++++
> > > 7 files changed, 30 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 01f271d40825..6e97e0bf8178 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1226,7 +1226,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> > > }
> > >
> > > ret = q->mq_ops->queue_rq(hctx, &bd);
> > > - if (ret == BLK_STS_RESOURCE) {
> > > + if ((ret == BLK_STS_RESOURCE) ||
> > > + (ret == BLK_STS_DEV_RESOURCE)) {
> > > /*
> > > * If an I/O scheduler has been configured and we got a
> > > * driver tag for the next request already, free it
> >
> > Just a nit, but this should be on one line.
>
> It is too long, and my editor starts to highlight/complain it, :-)
>
> >
> > > @@ -1764,6 +1775,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
> > > *cookie = new_cookie;
> > > break;
> > > case BLK_STS_RESOURCE:
> > > + case BLK_STS_DEV_RESOURCE:
> > > __blk_mq_requeue_request(rq);
> > > break;
> > > default:
> >
> > It seems the strategy for BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE is
> > too muddled: calling __blk_mq_requeue_request() for both will cause
> > underlying blk-mq driver to retain the request, won't it?
>
> blk_mq_request_issue_directly() is used by driver(dm-rq) on underlying
> queue, and driver need to deal with underlying queue busy, now we simply
> free the (underlying)request and feedback the busy status to blk-mq via
> dm-rq.
>
> Except for blk_mq_request_issue_directly(), this request need to be
> requeued, and is retained by blk-mq in hctx->dispatch_list.
>
> The difference is that if driver returns BLK_STS_DEV_RESOURCE, the queue
> will be rerun when resource is available, so don't need to run queue after
> a delay for avoiding IO hang explicitly.
>
> >
> > > @@ -1826,7 +1838,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
> > > hctx_lock(hctx, &srcu_idx);
> > >
> > > ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
> > > - if (ret == BLK_STS_RESOURCE)
> > > + if ((ret == BLK_STS_RESOURCE) || (ret == BLK_STS_DEV_RESOURCE))
> > > blk_mq_sched_insert_request(rq, false, true, false);
> > > else if (ret != BLK_STS_OK)
> > > blk_mq_end_request(rq, ret);
> >
> > For this normal (non dm-mpath) case the request gets re-inserted;
> > dm-mpath must avoid that.
> >
> > But with dm-mpath, which instead uses blk_mq_request_issue_directly(),
> > we're driving IO with stacked blk-mq drivers. If the underlying blk-mq
> > driver (e.g. scsi-mq or nvme) is made to retain the request, using
> > __blk_mq_issue_directly()'s call to __blk_mq_requeue_request() above,
> > then dm-mpath will not have the ability to requeue the request without
> > conflicting with the underlying blk-mq driver, will it?
>
> No, as I explained, the exception is blk_mq_request_issue_directly(),
> and now dm-rq simply frees it(and in my original version, this request
> is cached for underlying queue, and reused in next dispatch), for others,
> the request is retained in hctx->dispatch_list, and owned by blk-mq.
>
> >
> > Or am I'm misunderstanding what __blk_mq_requeue_request() is doing?
> >
> > dm_mq_queue_rq
> > -> multipath_clone_and_map
> > -> blk_get_request (scsi_mq)
> > -> if error, dm-mpath conditionally requeues (w/ or w/o delay)
>
> Yes, with this patch, most of times blk-mq will run queue w/ delay
> because SCHED_RESTART is set after the 1st STS_RESOURCE from dm-rq
> .queue_rq()
>
> > -> if BLK_STS_OK then blk_mq_request_issue_directly() gets called
> > -> dm_dispatch_clone_request
> > -> blk_mq_request_issue_directly
> > -> __blk_mq_try_issue_directly
> > -> __blk_mq_issue_directly
> > -> q->mq_ops->queue_rq (this is the underlying scsi_mq)
> > -> a BLK_STS_RESOURCE return here is how Bart was able to cause stalls
>
> The stall only happens when SCHED_RESTART is set and the dm-rq queue is
> idle(no any in-flight requests), that is exactly what this patch tries to
> address as suggested by Jens.
>
> > -> __blk_mq_requeue_request, if BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE **1
> > -> (return from blk_mq_request_issue_directly)
> > -> if BLK_STS_RESOURCE, the dm-mpath request is released using blk_put_request();
> > and DM_MAPIO_REQUEUE is returned to dm_mq_queue_rq **2
>
> Right.
>
> > -> if DM_MAPIO_REQUEUE return from map_request()'s call to dm_dispatch_clone_request:
> > BLK_STS_RESOURCE is returned from dm-mpath's dm_mq_queue_rq
>
> Right.
>
> >
> > The redundant queueing (both to underlying blk-mq at **1 above, and
> > upper layer blk-mq at **2 above) is what I'm concerned about.
> >
> > Hope this is clear.
>
> Yeah, it is quite clear.
>
> I also have other dm-mpath specific questions:
>
> 1) when any STS_RESOURCE is returned from underlying queue either
> because of blk_get_request() or underlying .queue_rq() for a while,
> will dm-mpath try to switch to other path?
>
> 2) what is the basic path switch policy of dm-mpath?
>
> 3) is it possible to move the check of 'ti->type->busy' into
> .clone_and_map_rq()? if it is possible, this way might be more
> effective to detect underlying queue busy.
>
> Actually this patch may has another issue: if the default run queue
> delay(in this patch, it is 10ms) is too short, the timer may expire
> before any in-flight underlying request completes, then we may
> dequeue too quick, and sequential IO performance can be hurt too.
>
> But my previous patch in github doesn't have this issue.
>
> https://github.com/ming1/linux/commit/dfd672c998283a110247152237a9916b8264f3ec
>
> Jens, what do you think of this issue? Or do we need to worry about
> it?
Just forget we have discussed to introduce blk_get_request_notify() which
can address this issue too by returning BLK_STS_DEV_RESOURCE in case
blk_get_request_notify() gets NULL.
--
Ming
next prev parent reply other threads:[~2018-01-21 1:32 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-20 13:48 [PATCH] blk-mq: introduce BLK_STS_DEV_RESOURCE Ming Lei
2018-01-20 17:30 ` Mike Snitzer
2018-01-21 0:57 ` Ming Lei
2018-01-21 1:32 ` Ming Lei [this message]
2018-01-21 3:52 ` Mike Snitzer
2018-01-22 3:18 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180121013248.GA13715@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.