From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
linux-scsi@vger.kernel.org,
Laurence Oberman <loberman@redhat.com>,
Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: blk-mq: introduce BLK_STS_DEV_RESOURCE
Date: Sat, 20 Jan 2018 12:30:02 -0500 [thread overview]
Message-ID: <20180120173001.GA3563@redhat.com> (raw)
In-Reply-To: <20180120134813.4446-1-ming.lei@redhat.com>
On Sat, Jan 20 2018 at 8:48am -0500,
Ming Lei <ming.lei@redhat.com> wrote:
> This status is returned from driver to block layer if device related
> resource is run out of, but driver can guarantee that IO dispatch is
> triggered in future when the resource is available.
>
> This patch converts some drivers to use this return value. Meantime
> if driver returns BLK_STS_RESOURCE and S_SCHED_RESTART is marked, run
> queue after 10ms for avoiding IO hang.
>
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Bart Van Assche <bart.vanassche@sandisk.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> block/blk-core.c | 1 +
> block/blk-mq.c | 20 ++++++++++++++++----
> drivers/block/null_blk.c | 2 +-
> drivers/block/virtio_blk.c | 2 +-
> drivers/block/xen-blkfront.c | 2 +-
> drivers/scsi/scsi_lib.c | 6 +++---
> include/linux/blk_types.h | 7 +++++++
> 7 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 01f271d40825..6e97e0bf8178 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1226,7 +1226,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> }
>
> ret = q->mq_ops->queue_rq(hctx, &bd);
> - if (ret == BLK_STS_RESOURCE) {
> + if ((ret == BLK_STS_RESOURCE) ||
> + (ret == BLK_STS_DEV_RESOURCE)) {
> /*
> * If an I/O scheduler has been configured and we got a
> * driver tag for the next request already, free it
Just a nit, but this should be on one line.
> @@ -1764,6 +1775,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
> *cookie = new_cookie;
> break;
> case BLK_STS_RESOURCE:
> + case BLK_STS_DEV_RESOURCE:
> __blk_mq_requeue_request(rq);
> break;
> default:
It seems the strategy for BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE is
too muddled: calling __blk_mq_requeue_request() for both will cause
underlying blk-mq driver to retain the request, won't it?
> @@ -1826,7 +1838,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
> hctx_lock(hctx, &srcu_idx);
>
> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
> - if (ret == BLK_STS_RESOURCE)
> + if ((ret == BLK_STS_RESOURCE) || (ret == BLK_STS_DEV_RESOURCE))
> blk_mq_sched_insert_request(rq, false, true, false);
> else if (ret != BLK_STS_OK)
> blk_mq_end_request(rq, ret);
For this normal (non dm-mpath) case the request gets re-inserted;
dm-mpath must avoid that.
But with dm-mpath, which instead uses blk_mq_request_issue_directly(),
we're driving IO with stacked blk-mq drivers. If the underlying blk-mq
driver (e.g. scsi-mq or nvme) is made to retain the request, using
__blk_mq_issue_directly()'s call to __blk_mq_requeue_request() above,
then dm-mpath will not have the ability to requeue the request without
conflicting with the underlying blk-mq driver, will it?
Or am I'm misunderstanding what __blk_mq_requeue_request() is doing?
dm_mq_queue_rq
-> multipath_clone_and_map
-> blk_get_request (scsi_mq)
-> if error, dm-mpath conditionally requeues (w/ or w/o delay)
-> if BLK_STS_OK then blk_mq_request_issue_directly() gets called
-> dm_dispatch_clone_request
-> blk_mq_request_issue_directly
-> __blk_mq_try_issue_directly
-> __blk_mq_issue_directly
-> q->mq_ops->queue_rq (this is the underlying scsi_mq)
-> a BLK_STS_RESOURCE return here is how Bart was able to cause stalls
-> __blk_mq_requeue_request, if BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE **1
-> (return from blk_mq_request_issue_directly)
-> if BLK_STS_RESOURCE, the dm-mpath request is released using blk_put_request();
and DM_MAPIO_REQUEUE is returned to dm_mq_queue_rq **2
-> if DM_MAPIO_REQUEUE return from map_request()'s call to dm_dispatch_clone_request:
BLK_STS_RESOURCE is returned from dm-mpath's dm_mq_queue_rq
The redundant queueing (both to underlying blk-mq at **1 above, and
upper layer blk-mq at **2 above) is what I'm concerned about.
Hope this is clear.
I'd love to be missing something, please advise.
Thanks,
Mike
next prev parent reply other threads:[~2018-01-20 17:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-20 13:48 [PATCH] blk-mq: introduce BLK_STS_DEV_RESOURCE Ming Lei
2018-01-20 17:30 ` Mike Snitzer [this message]
2018-01-21 0:57 ` Ming Lei
2018-01-21 1:32 ` Ming Lei
2018-01-21 3:52 ` Mike Snitzer
2018-01-22 3:18 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180120173001.GA3563@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).