All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"axboe@fb.com" <axboe@fb.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	Bart Van Assche <Bart.VanAssche@wdc.com>
Subject: Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance
Date: Sat, 13 Jan 2018 23:04:35 +0800	[thread overview]
Message-ID: <20180113150434.GC8013@ming.t460p> (raw)
In-Reply-To: <20180112223117.GA6332@redhat.com>

On Fri, Jan 12, 2018 at 05:31:17PM -0500, Mike Snitzer wrote:
> On Fri, Jan 12 2018 at  1:54pm -0500,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote:
> > > OK, you have the stage: please give me a pointer to your best
> > > explaination of the several.
> > 
> > Since the previous discussion about this topic occurred more than a month
> > ago it could take more time to look up an explanation than to explain it
> > again. Anyway, here we go. As you know a block layer request queue needs to
> > be rerun if one or more requests are waiting and a previous condition that
> > prevented the request to be executed has been cleared. For the dm-mpath
> > driver, examples of such conditions are no tags available, a path that is
> > busy (see also pgpath_busy()), path initialization that is in progress
> > (pg_init_in_progress) or a request completes with status, e.g. if the
> > SCSI core calls __blk_mq_end_request(req, error) with error != 0. For some
> > of these conditions, e.g. path initialization completes, a callback
> > function in the dm-mpath driver is called and it is possible to explicitly
> > rerun the queue. I agree that for such scenario's a delayed queue run should
> > not be triggered. For other scenario's, e.g. if a SCSI initiator submits a
> > SCSI request over a fabric and the SCSI target replies with "BUSY" then the
> > SCSI core will end the I/O request with status BLK_STS_RESOURCE after the
> > maximum number of retries has been reached (see also scsi_io_completion()).
> > In that last case, if a SCSI target sends a "BUSY" reply over the wire back
> > to the initiator, there is no other approach for the SCSI initiator to
> > figure out whether it can queue another request than to resubmit the
> > request. The worst possible strategy is to resubmit a request immediately
> > because that will cause a significant fraction of the fabric bandwidth to
> > be used just for replying "BUSY" to requests that can't be processed
> > immediately.
> 
> The thing is, the 2 scenarios you are most concerned about have
> _nothing_ to do with dm_mq_queue_rq() at all.  They occur as an error in
> the IO path _after_ the request is successfully retrieved with
> blk_get_request() and then submitted.
>  
> > The intention of commit 6077c2d706097c0 was to address the last mentioned
> > case.
> 
> So it really makes me question why you think commit 6077c2d706097c0
> addresses the issue you think it does.  And gives me more conviction to
> remove 6077c2d706097c0.
> 
> It may help just by virtue of blindly kicking the queue if
> blk_get_request() fails (regardless of the target is responding with
> BUSY or not).  Very unsatisfying to say the least.
> 
> I think it'd be much more beneficial for dm-mpath.c:multipath_end_io()
> to be trained to be respond more intelligently to BLK_STS_RESOURCE.
> 
> E.g. set BLK_MQ_S_SCHED_RESTART if requests are known to be outstanding
> on the path.  This is one case where Ming said the queue would be
> re-run, as detailed in this header:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=5b18cff4baedde77e0d69bd62a13ae78f9488d89
> 
> And Jens has reinforced to me that BLK_MQ_S_SCHED_RESTART is a means to
> kicking the queue more efficiently.  _BUT_ I'm not seeing any external
> blk-mq interface that exposes this capability to a blk-mq driver.  As is
> BLK_MQ_S_SCHED_RESTART gets set very much in the bowels of blk-mq
> (blk_mq_sched_mark_restart_hctx).
> 
> SO I have to do more homework here...
> 
> Ming or Jens: might you be able to shed some light on how dm-mpath
> would/could set BLK_MQ_S_SCHED_RESTART?  A new function added that can

When BLK_STS_RESOURCE is returned from .queue_rq(), blk_mq_dispatch_rq_list()
will check if BLK_MQ_S_SCHED_RESTART is set.

If it has been set, the queue won't be rerun for this request, and the queue
will be rerun until one in-flight request is completed, see blk_mq_sched_restart()
which is called from blk_mq_free_request().

If BLK_MQ_S_SCHED_RESTART isn't set, queue is rerun in blk_mq_dispatch_rq_list(),
and BLK_MQ_S_SCHED_RESTART is set before calling .queue_rq(), see
blk_mq_sched_mark_restart_hctx() which is called in blk_mq_sched_dispatch_requests().

This mechanism can avoid continuous running queue in case of STS_RESOURCE, that
means drivers wouldn't worry about that by adding random delay.


-- 
Ming

WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"axboe@fb.com" <axboe@fb.com>
Subject: Re: [PATCH V3 0/5]  dm-rq: improve sequential I/O performance
Date: Sat, 13 Jan 2018 23:04:35 +0800	[thread overview]
Message-ID: <20180113150434.GC8013@ming.t460p> (raw)
In-Reply-To: <20180112223117.GA6332@redhat.com>

On Fri, Jan 12, 2018 at 05:31:17PM -0500, Mike Snitzer wrote:
> On Fri, Jan 12 2018 at  1:54pm -0500,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote:
> > > OK, you have the stage: please give me a pointer to your best
> > > explaination of the several.
> > 
> > Since the previous discussion about this topic occurred more than a month
> > ago it could take more time to look up an explanation than to explain it
> > again. Anyway, here we go. As you know a block layer request queue needs to
> > be rerun if one or more requests are waiting and a previous condition that
> > prevented the request to be executed has been cleared. For the dm-mpath
> > driver, examples of such conditions are no tags available, a path that is
> > busy (see also pgpath_busy()), path initialization that is in progress
> > (pg_init_in_progress) or a request completes with status, e.g. if the
> > SCSI core calls __blk_mq_end_request(req, error) with error != 0. For some
> > of these conditions, e.g. path initialization completes, a callback
> > function in the dm-mpath driver is called and it is possible to explicitly
> > rerun the queue. I agree that for such scenario's a delayed queue run should
> > not be triggered. For other scenario's, e.g. if a SCSI initiator submits a
> > SCSI request over a fabric and the SCSI target replies with "BUSY" then the
> > SCSI core will end the I/O request with status BLK_STS_RESOURCE after the
> > maximum number of retries has been reached (see also scsi_io_completion()).
> > In that last case, if a SCSI target sends a "BUSY" reply over the wire back
> > to the initiator, there is no other approach for the SCSI initiator to
> > figure out whether it can queue another request than to resubmit the
> > request. The worst possible strategy is to resubmit a request immediately
> > because that will cause a significant fraction of the fabric bandwidth to
> > be used just for replying "BUSY" to requests that can't be processed
> > immediately.
> 
> The thing is, the 2 scenarios you are most concerned about have
> _nothing_ to do with dm_mq_queue_rq() at all.  They occur as an error in
> the IO path _after_ the request is successfully retrieved with
> blk_get_request() and then submitted.
>  
> > The intention of commit 6077c2d706097c0 was to address the last mentioned
> > case.
> 
> So it really makes me question why you think commit 6077c2d706097c0
> addresses the issue you think it does.  And gives me more conviction to
> remove 6077c2d706097c0.
> 
> It may help just by virtue of blindly kicking the queue if
> blk_get_request() fails (regardless of the target is responding with
> BUSY or not).  Very unsatisfying to say the least.
> 
> I think it'd be much more beneficial for dm-mpath.c:multipath_end_io()
> to be trained to be respond more intelligently to BLK_STS_RESOURCE.
> 
> E.g. set BLK_MQ_S_SCHED_RESTART if requests are known to be outstanding
> on the path.  This is one case where Ming said the queue would be
> re-run, as detailed in this header:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=5b18cff4baedde77e0d69bd62a13ae78f9488d89
> 
> And Jens has reinforced to me that BLK_MQ_S_SCHED_RESTART is a means to
> kicking the queue more efficiently.  _BUT_ I'm not seeing any external
> blk-mq interface that exposes this capability to a blk-mq driver.  As is
> BLK_MQ_S_SCHED_RESTART gets set very much in the bowels of blk-mq
> (blk_mq_sched_mark_restart_hctx).
> 
> SO I have to do more homework here...
> 
> Ming or Jens: might you be able to shed some light on how dm-mpath
> would/could set BLK_MQ_S_SCHED_RESTART?  A new function added that can

When BLK_STS_RESOURCE is returned from .queue_rq(), blk_mq_dispatch_rq_list()
will check if BLK_MQ_S_SCHED_RESTART is set.

If it has been set, the queue won't be rerun for this request, and the queue
will be rerun until one in-flight request is completed, see blk_mq_sched_restart()
which is called from blk_mq_free_request().

If BLK_MQ_S_SCHED_RESTART isn't set, queue is rerun in blk_mq_dispatch_rq_list(),
and BLK_MQ_S_SCHED_RESTART is set before calling .queue_rq(), see
blk_mq_sched_mark_restart_hctx() which is called in blk_mq_sched_dispatch_requests().

This mechanism can avoid continuous running queue in case of STS_RESOURCE, that
means drivers wouldn't worry about that by adding random delay.


-- 
Ming

  reply	other threads:[~2018-01-13 15:04 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11  6:01 [PATCH V3 0/5] dm-rq: improve sequential I/O performance Ming Lei
2018-01-11  6:01 ` [PATCH V3 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2018-01-11  6:01 ` [PATCH V3 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
2018-01-12 19:04   ` Bart Van Assche
2018-01-12 19:04     ` Bart Van Assche
2018-01-13  1:29     ` Ming Lei
2018-01-11  6:01 ` [PATCH V3 3/5] blk-mq: move actual issue into one helper Ming Lei
2018-01-11 22:09   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
2018-01-11 22:10   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei
2018-01-11 22:42   ` Mike Snitzer
2018-01-11 22:07 ` [PATCH V3 0/5] dm-rq: improve sequential I/O performance Mike Snitzer
2018-01-11 22:37   ` Bart Van Assche
2018-01-11 22:37     ` Bart Van Assche
2018-01-11 22:58     ` Mike Snitzer
2018-01-11 23:27       ` Bart Van Assche
2018-01-11 23:27         ` Bart Van Assche
2018-01-12  1:43         ` Mike Snitzer
2018-01-12  1:42     ` Ming Lei
2018-01-12  1:57       ` Mike Snitzer
2018-01-12  3:33         ` Ming Lei
2018-01-12 17:18           ` Mike Snitzer
2018-01-12 17:26             ` Bart Van Assche
2018-01-12 17:26               ` Bart Van Assche
2018-01-12 17:40               ` Mike Snitzer
2018-01-12 17:46                 ` Bart Van Assche
2018-01-12 17:46                   ` Bart Van Assche
2018-01-12 18:06                   ` Mike Snitzer
2018-01-12 18:54                     ` Bart Van Assche
2018-01-12 18:54                       ` Bart Van Assche
2018-01-12 19:29                       ` Mike Snitzer
2018-01-12 19:53                       ` Elliott, Robert (Persistent Memory)
2018-01-12 19:53                         ` Elliott, Robert (Persistent Memory)
2018-01-13  0:52                         ` Mike Snitzer
2018-01-13  1:00                           ` Bart Van Assche
2018-01-13  1:00                             ` Bart Van Assche
2018-01-13  1:37                             ` Mike Snitzer
2018-01-13 15:14                               ` Mike Snitzer
2018-01-12 22:31                       ` Mike Snitzer
2018-01-13 15:04                         ` Ming Lei [this message]
2018-01-13 15:04                           ` Ming Lei
2018-01-13 15:10                           ` Mike Snitzer
2018-01-12 23:17                       ` Mike Snitzer
2018-01-12 23:42                         ` Bart Van Assche
2018-01-12 23:42                           ` Bart Van Assche
2018-01-13  0:45                           ` Mike Snitzer
2018-01-13 14:34                       ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180113150434.GC8013@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@fb.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.