linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	linux-scsi@vger.kernel.org,
	Laurence Oberman <loberman@redhat.com>,
	Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: blk-mq: introduce BLK_STS_DEV_RESOURCE
Date: Sat, 20 Jan 2018 12:30:02 -0500	[thread overview]
Message-ID: <20180120173001.GA3563@redhat.com> (raw)
In-Reply-To: <20180120134813.4446-1-ming.lei@redhat.com>

On Sat, Jan 20 2018 at  8:48am -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> This status is returned from driver to block layer if device related
> resource is run out of, but driver can guarantee that IO dispatch is
> triggered in future when the resource is available.
> 
> This patch converts some drivers to use this return value. Meantime
> if driver returns BLK_STS_RESOURCE and S_SCHED_RESTART is marked, run
> queue after 10ms for avoiding IO hang.
> 
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Bart Van Assche <bart.vanassche@sandisk.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c             |  1 +
>  block/blk-mq.c               | 20 ++++++++++++++++----
>  drivers/block/null_blk.c     |  2 +-
>  drivers/block/virtio_blk.c   |  2 +-
>  drivers/block/xen-blkfront.c |  2 +-
>  drivers/scsi/scsi_lib.c      |  6 +++---
>  include/linux/blk_types.h    |  7 +++++++
>  7 files changed, 30 insertions(+), 10 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 01f271d40825..6e97e0bf8178 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1226,7 +1226,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  		}
>  
>  		ret = q->mq_ops->queue_rq(hctx, &bd);
> -		if (ret == BLK_STS_RESOURCE) {
> +		if ((ret == BLK_STS_RESOURCE) ||
> +				(ret == BLK_STS_DEV_RESOURCE)) {
>  			/*
>  			 * If an I/O scheduler has been configured and we got a
>  			 * driver tag for the next request already, free it

Just a nit, but this should be on one line.

> @@ -1764,6 +1775,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		*cookie = new_cookie;
>  		break;
>  	case BLK_STS_RESOURCE:
> +	case BLK_STS_DEV_RESOURCE:
>  		__blk_mq_requeue_request(rq);
>  		break;
>  	default:

It seems the strategy for BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE is
too muddled: calling __blk_mq_requeue_request() for both will cause
underlying blk-mq driver to retain the request, won't it?

> @@ -1826,7 +1838,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	hctx_lock(hctx, &srcu_idx);
>  
>  	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
> -	if (ret == BLK_STS_RESOURCE)
> +	if ((ret == BLK_STS_RESOURCE) || (ret == BLK_STS_DEV_RESOURCE))
>  		blk_mq_sched_insert_request(rq, false, true, false);
>  	else if (ret != BLK_STS_OK)
>  		blk_mq_end_request(rq, ret);

For this normal (non dm-mpath) case the request gets re-inserted;
dm-mpath must avoid that.

But with dm-mpath, which instead uses blk_mq_request_issue_directly(),
we're driving IO with stacked blk-mq drivers.  If the underlying blk-mq 
driver (e.g. scsi-mq or nvme) is made to retain the request, using
__blk_mq_issue_directly()'s call to __blk_mq_requeue_request() above,
then dm-mpath will not have the ability to requeue the request without
conflicting with the underlying blk-mq driver, will it?

Or am I'm misunderstanding what __blk_mq_requeue_request() is doing?

dm_mq_queue_rq
-> multipath_clone_and_map
   -> blk_get_request (scsi_mq)
      -> if error, dm-mpath conditionally requeues (w/ or w/o delay)
      -> if BLK_STS_OK then blk_mq_request_issue_directly() gets called
-> dm_dispatch_clone_request
   -> blk_mq_request_issue_directly
      -> __blk_mq_try_issue_directly
         -> __blk_mq_issue_directly
            -> q->mq_ops->queue_rq (this is the underlying scsi_mq)
               -> a BLK_STS_RESOURCE return here is how Bart was able to cause stalls
            -> __blk_mq_requeue_request, if BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE **1
   -> (return from blk_mq_request_issue_directly)
   -> if BLK_STS_RESOURCE, the dm-mpath request is released using blk_put_request();
                           and DM_MAPIO_REQUEUE is returned to dm_mq_queue_rq **2
-> if DM_MAPIO_REQUEUE return from map_request()'s call to dm_dispatch_clone_request:
   BLK_STS_RESOURCE is returned from dm-mpath's dm_mq_queue_rq

The redundant queueing (both to underlying blk-mq at **1 above, and
upper layer blk-mq at **2 above) is what I'm concerned about.

Hope this is clear.

I'd love to be missing something, please advise.

Thanks,
Mike

  reply	other threads:[~2018-01-20 17:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-20 13:48 [PATCH] blk-mq: introduce BLK_STS_DEV_RESOURCE Ming Lei
2018-01-20 17:30 ` Mike Snitzer [this message]
2018-01-21  0:57   ` Ming Lei
2018-01-21  1:32     ` Ming Lei
2018-01-21  3:52     ` Mike Snitzer
2018-01-22  3:18       ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180120173001.GA3563@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).