* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE [not found] <20170915164456.9803-1-ming.lei@redhat.com> @ 2017-09-15 16:44 ` Ming Lei 2017-09-15 17:57 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw) To: dm-devel, Mike Snitzer, Jens Axboe Cc: linux-block, Christoph Hellwig, Bart Van Assche, Laurence Oberman, Ming Lei, Sagi Grimberg, linux-nvme, James E.J. Bottomley, Martin K. Petersen, linux-scsi If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun the queue in the three situations: 1) if BLK_MQ_S_SCHED_RESTART is set - queue is rerun after one rq is completed, see blk_mq_sched_restart() which is run from blk_mq_free_request() 2) BLK_MQ_S_TAG_WAITING is set - queue is rerun after one tag is freed 3) otherwise - queue is run immediately in blk_mq_dispatch_rq_list() So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make sense because no matter it is called or not, the queue still will be rerun soon in above three situations, and the busy req can be dispatched again. Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/md/dm-rq.c | 1 - drivers/nvme/host/fc.c | 3 --- drivers/scsi/scsi_lib.c | 4 ---- 3 files changed, 8 deletions(-) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index c6ebc5b1e00e..71422cea1c4a 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -761,7 +761,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, /* Undo dm_start_request() before requeuing */ rq_end_stats(md, rq); rq_completed(md, rq_data_dir(rq), false); - blk_mq_delay_run_hw_queue(hctx, 100/*ms*/); return BLK_STS_RESOURCE; } diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index d2e882c0f496..17727eee5d5f 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2041,9 +2041,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, return BLK_STS_OK; busy: - if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx) - blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY); - return BLK_STS_RESOURCE; } diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 9cf6a80fe297..71d5bf1345c9 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2010,11 +2010,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, out: switch (ret) { case BLK_STS_OK: - break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); break; default: /* -- 2.9.5 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-15 16:44 ` [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei @ 2017-09-15 17:57 ` Bart Van Assche 2017-09-17 12:40 ` Ming Lei 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-15 17:57 UTC (permalink / raw) To: dm-devel@redhat.com, axboe@fb.com, ming.lei@redhat.com, snitzer@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, Bart Van Assche, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote: > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun > the queue in the three situations: > > 1) if BLK_MQ_S_SCHED_RESTART is set > - queue is rerun after one rq is completed, see blk_mq_sched_restart() > which is run from blk_mq_free_request() > > 2) BLK_MQ_S_TAG_WAITING is set > - queue is rerun after one tag is freed > > 3) otherwise > - queue is run immediately in blk_mq_dispatch_rq_list() > > So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make > sense because no matter it is called or not, the queue still will be > rerun soon in above three situations, and the busy req can be dispatched > again. NAK Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue has to be rerun later even if no request has completed before the delay has expired. This patch breaks at least the SCSI core and the dm-mpath drivers. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-15 17:57 ` Bart Van Assche @ 2017-09-17 12:40 ` Ming Lei 2017-09-18 15:18 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Ming Lei @ 2017-09-17 12:40 UTC (permalink / raw) To: Bart Van Assche Cc: dm-devel@redhat.com, axboe@fb.com, snitzer@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com On Fri, Sep 15, 2017 at 05:57:31PM +0000, Bart Van Assche wrote: > On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote: > > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun > > the queue in the three situations: > > > > 1) if BLK_MQ_S_SCHED_RESTART is set > > - queue is rerun after one rq is completed, see blk_mq_sched_restart() > > which is run from blk_mq_free_request() > > > > 2) BLK_MQ_S_TAG_WAITING is set > > - queue is rerun after one tag is freed > > > > 3) otherwise > > - queue is run immediately in blk_mq_dispatch_rq_list() > > > > So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make > > sense because no matter it is called or not, the queue still will be > > rerun soon in above three situations, and the busy req can be dispatched > > again. > > NAK > > Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue > has to be rerun later even if no request has completed before the delay has > expired. This patch breaks at least the SCSI core and the dm-mpath drivers. "if no request has completed before the delay has expired" can't be a reason to rerun the queue, because the queue can still be busy. The only reason is that there isn't any requests in-flight and queue is still busy, but I'd rather understand what the exact situation is, instead of using this kind of workaround. If no such situation, we should remove that. For SCSI, it might be reasonable to run the hw queue after a random time when atomic_read(&sdev->device_busy) is zero, that means the queue may be busy even when there isn't any requests in-flight in this queue. Could you or someone share what the case is? Then we can avoid to use this workaround. For dm-mpath, it might be related with path, but I have to say it is still a workaround. I suggest to understand the root cause, instead of keeping this ugly random delay because run hw queue after 100ms may be useless in 99.99% times. -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-17 12:40 ` Ming Lei @ 2017-09-18 15:18 ` Bart Van Assche 2017-09-19 5:43 ` Ming Lei 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-18 15:18 UTC (permalink / raw) To: ming.lei@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, snitzer@redhat.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > "if no request has completed before the delay has expired" can't be a > reason to rerun the queue, because the queue can still be busy. That statement of you shows that there are important aspects of the SCSI core and dm-mpath driver that you don't understand. > I suggest to understand the root cause, instead of keeping this > ugly random delay because run hw queue after 100ms may be useless > in 99.99% times. If you are still looking at removing the blk_mq_delay_run_hw_queue() calls then I think you are looking in the wrong direction. What kind of problem are you trying to solve? Is it perhaps that there can be a delay between dm-mpath request completion and the queueing of a new request? If so, adding a queue run call into the dm-mpath end_io callback is probably sufficient and probably can replace this entire patch series. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-18 15:18 ` Bart Van Assche @ 2017-09-19 5:43 ` Ming Lei 2017-09-19 15:36 ` Bart Van Assche 2017-09-19 15:48 ` Mike Snitzer 0 siblings, 2 replies; 20+ messages in thread From: Ming Lei @ 2017-09-19 5:43 UTC (permalink / raw) To: Bart Van Assche Cc: hch@infradead.org, jejb@linux.vnet.ibm.com, sagi@grimberg.me, linux-scsi@vger.kernel.org, axboe@fb.com, snitzer@redhat.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, dm-devel@redhat.com, martin.petersen@oracle.com, loberman@redhat.com On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > > "if no request has completed before the delay has expired" can't be a > > reason to rerun the queue, because the queue can still be busy. > > That statement of you shows that there are important aspects of the SCSI > core and dm-mpath driver that you don't understand. Then can you tell me why blk-mq's SCHED_RESTART can't cover the rerun when there are in-flight requests? What is the case in which dm-rq can return BUSY and there aren't any in-flight requests meantime? Also you are the author of adding 'blk_mq_delay_run_hw_queue( hctx, 100/*ms*/)' in dm-rq, you never explain in commit 6077c2d706097c0(dm rq: Avoid that request processing stalls sporadically) what the root cause is for your request stall and why this patch fixes your issue. Even you don't explain why is the delay 100ms? So it is a workaound, isn't it? My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) in the hot path since it should been covered by SCHED_RESTART if there are in-flight requests. > > > I suggest to understand the root cause, instead of keeping this > > ugly random delay because run hw queue after 100ms may be useless > > in 99.99% times. > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls > then I think you are looking in the wrong direction. What kind of problem > are you trying to solve? Is it perhaps that there can be a delay between Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't need this patch. -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 5:43 ` Ming Lei @ 2017-09-19 15:36 ` Bart Van Assche 2017-09-19 15:56 ` Mike Snitzer 2017-09-19 15:48 ` Mike Snitzer 1 sibling, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-19 15:36 UTC (permalink / raw) To: ming.lei@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, snitzer@redhat.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote: > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls > > then I think you are looking in the wrong direction. What kind of problem > > are you trying to solve? Is it perhaps that there can be a delay between > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't > need this patch. The approach of this patch series looks wrong to me and patch 5/5 is an ugly hack in my opinion. That's why I asked you to drop the entire patch series and to test whether inserting a queue run call into the dm-mpath end_io callback yields a similar performance improvement to the patches you posted. Please do not expect me to spend more time on this patch series if you do not come up with measurement results for the proposed alternative. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 15:36 ` Bart Van Assche @ 2017-09-19 15:56 ` Mike Snitzer 2017-09-19 16:04 ` Ming Lei 0 siblings, 1 reply; 20+ messages in thread From: Mike Snitzer @ 2017-09-19 15:56 UTC (permalink / raw) To: Bart Van Assche Cc: ming.lei@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19 2017 at 11:36am -0400, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote: > > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls > > > then I think you are looking in the wrong direction. What kind of problem > > > are you trying to solve? Is it perhaps that there can be a delay between > > > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't > > need this patch. > > The approach of this patch series looks wrong to me and patch 5/5 is an ugly > hack in my opinion. That's why I asked you to drop the entire patch series and > to test whether inserting a queue run call into the dm-mpath end_io callback > yields a similar performance improvement to the patches you posted. Please do > not expect me to spend more time on this patch series if you do not come up > with measurement results for the proposed alternative. Bart, asserting that Ming's work is a hack doesn't help your apparent goal of deligitimizing Ming's work. Nor does it take away from the fact that your indecision on appropriate timeouts (let alone ability to defend and/or explain them) validates Ming calling them into question (which you are now dodging regularly). But please don't take this feedback and shut-down. Instead please work together more constructively. This doesn't need to be adversarial! I am at a loss for why there is such animosity from your end Bart. Please dial it back. It is just a distraction that fosters needless in-fighting. Believe it or not: Ming is just trying to improve the code because he has a testbed that is showing fairly abysmal performance with dm-mq multipath (on lpfc with scsi-mq). Ming, if you can: please see if what Bart has proposed (instead: run queue at end_io) helps. Though I don't yet see why that should be needed. Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 15:56 ` Mike Snitzer @ 2017-09-19 16:04 ` Ming Lei 2017-09-19 16:49 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Ming Lei @ 2017-09-19 16:04 UTC (permalink / raw) To: Mike Snitzer Cc: Bart Van Assche, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19, 2017 at 11:56:03AM -0400, Mike Snitzer wrote: > On Tue, Sep 19 2017 at 11:36am -0400, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote: > > > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > > > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls > > > > then I think you are looking in the wrong direction. What kind of problem > > > > are you trying to solve? Is it perhaps that there can be a delay between > > > > > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't > > > need this patch. > > > > The approach of this patch series looks wrong to me and patch 5/5 is an ugly > > hack in my opinion. That's why I asked you to drop the entire patch series and > > to test whether inserting a queue run call into the dm-mpath end_io callback > > yields a similar performance improvement to the patches you posted. Please do > > not expect me to spend more time on this patch series if you do not come up > > with measurement results for the proposed alternative. > > Bart, asserting that Ming's work is a hack doesn't help your apparent > goal of deligitimizing Ming's work. > > Nor does it take away from the fact that your indecision on appropriate > timeouts (let alone ability to defend and/or explain them) validates > Ming calling them into question (which you are now dodging regularly). > > But please don't take this feedback and shut-down. Instead please work > together more constructively. This doesn't need to be adversarial! I > am at a loss for why there is such animosity from your end Bart. > > Please dial it back. It is just a distraction that fosters needless > in-fighting. > > Believe it or not: Ming is just trying to improve the code because he > has a testbed that is showing fairly abysmal performance with dm-mq > multipath (on lpfc with scsi-mq). > > Ming, if you can: please see if what Bart has proposed (instead: run > queue at end_io) helps. Though I don't yet see why that should be > needed. Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART to do that already. The dm-mpath performance issue is nothing to do with that, actually the issue is very similar with my improvement on SCSI-MQ, because now dm_dispatch_clone_request() doesn't know if the underlying queue is busy or not, and dm-rq/mpath is just trying to dispatch more request to underlying queue even though it is busy, then IO merge can't be done easily on dm-rq/mpath. I am thinking another way to improve this issue, since some SCSI device's queue_depth is different with .cmd_per_lun, will post patchset soon. -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 16:04 ` Ming Lei @ 2017-09-19 16:49 ` Bart Van Assche 2017-09-19 16:55 ` Ming Lei 2017-09-20 1:19 ` Ming Lei 0 siblings, 2 replies; 20+ messages in thread From: Bart Van Assche @ 2017-09-19 16:49 UTC (permalink / raw) To: ming.lei@redhat.com, snitzer@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART > to do that already. Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to reexamine the software queues and the hctx dispatch list but not the requeue list. If a block driver returns BLK_STS_RESOURCE then requests end up on the requeue list. Hence the following code in scsi_end_request(): if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list)) kblockd_schedule_work(&sdev->requeue_work); else blk_mq_run_hw_queues(q, true); Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 16:49 ` Bart Van Assche @ 2017-09-19 16:55 ` Ming Lei 2017-09-19 18:42 ` Bart Van Assche 2017-09-20 1:19 ` Ming Lei 1 sibling, 1 reply; 20+ messages in thread From: Ming Lei @ 2017-09-19 16:55 UTC (permalink / raw) To: Bart Van Assche Cc: ming.lei@redhat.com, snitzer@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: >> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART >> to do that already. > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to > reexamine the software queues and the hctx dispatch list but not the requeue > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the > requeue list. Hence the following code in scsi_end_request(): That doesn't need SCHED_RESTART, because it is requeue's responsibility to do that, see blk_mq_requeue_work(), which will run hw queue at the end of this func. -- Ming Lei ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 16:55 ` Ming Lei @ 2017-09-19 18:42 ` Bart Van Assche 2017-09-19 22:44 ` Ming Lei 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-19 18:42 UTC (permalink / raw) To: tom.leiming@gmail.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, snitzer@redhat.com, martin.petersen@oracle.com, ming.lei@redhat.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote: > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche > <Bart.VanAssche@wdc.com> wrote: > > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART > > > to do that already. > > > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to > > reexamine the software queues and the hctx dispatch list but not the requeue > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the > > requeue list. Hence the following code in scsi_end_request(): > > That doesn't need SCHED_RESTART, because it is requeue's > responsibility to do that, > see blk_mq_requeue_work(), which will run hw queue at the end of this func. That's not what I was trying to explain. What I was trying to explain is that every block driver that can cause a request to end up on the requeue list is responsible for kicking the requeue list at a later time. Hence the kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the dm code. What I would like to see is measurement results for dm-mpath without this patch series and a call to kick the requeue list added to the dm-mpath end_io code. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 18:42 ` Bart Van Assche @ 2017-09-19 22:44 ` Ming Lei 2017-09-19 23:25 ` Bart Van Assche 0 siblings, 1 reply; 20+ messages in thread From: Ming Lei @ 2017-09-19 22:44 UTC (permalink / raw) To: Bart Van Assche Cc: tom.leiming@gmail.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, snitzer@redhat.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19, 2017 at 06:42:30PM +0000, Bart Van Assche wrote: > On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote: > > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche > > <Bart.VanAssche@wdc.com> wrote: > > > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: > > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART > > > > to do that already. > > > > > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to > > > reexamine the software queues and the hctx dispatch list but not the requeue > > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the > > > requeue list. Hence the following code in scsi_end_request(): > > > > That doesn't need SCHED_RESTART, because it is requeue's > > responsibility to do that, > > see blk_mq_requeue_work(), which will run hw queue at the end of this func. > > That's not what I was trying to explain. What I was trying to explain is that > every block driver that can cause a request to end up on the requeue list is > responsible for kicking the requeue list at a later time. Hence the > kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the > blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the > dm code. What I would like to see is measurement results for dm-mpath without > this patch series and a call to kick the requeue list added to the dm-mpath > end_io code. For this issue, it isn't same between SCSI and dm-rq. We don't need to run queue in .end_io of dm, and the theory is simple, otherwise it isn't performance issue, and should be I/O hang. 1) every dm-rq's request is 1:1 mapped to SCSI's request 2) if there is any mapped SCSI request not finished, either in-flight or in requeue list or whatever, there will be one corresponding dm-rq's request in-flight 3) once the mapped SCSI request is completed, dm-rq's completion path will be triggered and dm-rq's queue will be rerun because of SCHED_RESTART in dm-rq So the hw queue of dm-rq has been run in dm-rq's completion path already, right? Why do we need to do it again in the hot path? -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 22:44 ` Ming Lei @ 2017-09-19 23:25 ` Bart Van Assche 2017-09-19 23:50 ` Mike Snitzer 0 siblings, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-19 23:25 UTC (permalink / raw) To: ming.lei@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, tom.leiming@gmail.com, sagi@grimberg.me, snitzer@redhat.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote: > For this issue, it isn't same between SCSI and dm-rq. > > We don't need to run queue in .end_io of dm, and the theory is > simple, otherwise it isn't performance issue, and should be I/O hang. > > 1) every dm-rq's request is 1:1 mapped to SCSI's request > > 2) if there is any mapped SCSI request not finished, either > in-flight or in requeue list or whatever, there will be one > corresponding dm-rq's request in-flight > > 3) once the mapped SCSI request is completed, dm-rq's completion > path will be triggered and dm-rq's queue will be rerun because of > SCHED_RESTART in dm-rq > > So the hw queue of dm-rq has been run in dm-rq's completion path > already, right? Why do we need to do it again in the hot path? The measurement data in the description of patch 5/5 shows a significant performance regression for an important workload, namely random I/O. Additionally, the performance improvement for sequential I/O was achieved for an unrealistically low queue depth. Sorry but given these measurement results I don't see why I should spend more time on this patch series. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 23:25 ` Bart Van Assche @ 2017-09-19 23:50 ` Mike Snitzer 2017-09-20 1:13 ` Ming Lei 0 siblings, 1 reply; 20+ messages in thread From: Mike Snitzer @ 2017-09-19 23:50 UTC (permalink / raw) To: Bart Van Assche Cc: ming.lei@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, tom.leiming@gmail.com, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19 2017 at 7:25pm -0400, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote: > > For this issue, it isn't same between SCSI and dm-rq. > > > > We don't need to run queue in .end_io of dm, and the theory is > > simple, otherwise it isn't performance issue, and should be I/O hang. > > > > 1) every dm-rq's request is 1:1 mapped to SCSI's request > > > > 2) if there is any mapped SCSI request not finished, either > > in-flight or in requeue list or whatever, there will be one > > corresponding dm-rq's request in-flight > > > > 3) once the mapped SCSI request is completed, dm-rq's completion > > path will be triggered and dm-rq's queue will be rerun because of > > SCHED_RESTART in dm-rq > > > > So the hw queue of dm-rq has been run in dm-rq's completion path > > already, right? Why do we need to do it again in the hot path? > > The measurement data in the description of patch 5/5 shows a significant > performance regression for an important workload, namely random I/O. > Additionally, the performance improvement for sequential I/O was achieved > for an unrealistically low queue depth. So you've ignored Ming's question entirely and instead decided to focus on previous questions you raised to Ming that he ignored. This is getting tedious. Especially given that Ming said the first patch that all this fighting has been over isn't even required to attain the improvements. Ming, please retest both your baseline and patched setup with a queue_depth of >= 32. Also, please do 3 - 5 runs to get a avg and std dev across the runs. > Sorry but given these measurement results I don't see why I should > spend more time on this patch series. Bart, I've historically genuinely always appreciated your review and insight. But if your future "review" on this patchset would take the form shown in this thread then please don't spend more time on it. Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 23:50 ` Mike Snitzer @ 2017-09-20 1:13 ` Ming Lei 0 siblings, 0 replies; 20+ messages in thread From: Ming Lei @ 2017-09-20 1:13 UTC (permalink / raw) To: Mike Snitzer Cc: Bart Van Assche, linux-block@vger.kernel.org, hch@infradead.org, tom.leiming@gmail.com, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com Hi Mike, On Tue, Sep 19, 2017 at 07:50:06PM -0400, Mike Snitzer wrote: > On Tue, Sep 19 2017 at 7:25pm -0400, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote: > > > For this issue, it isn't same between SCSI and dm-rq. > > > > > > We don't need to run queue in .end_io of dm, and the theory is > > > simple, otherwise it isn't performance issue, and should be I/O hang. > > > > > > 1) every dm-rq's request is 1:1 mapped to SCSI's request > > > > > > 2) if there is any mapped SCSI request not finished, either > > > in-flight or in requeue list or whatever, there will be one > > > corresponding dm-rq's request in-flight > > > > > > 3) once the mapped SCSI request is completed, dm-rq's completion > > > path will be triggered and dm-rq's queue will be rerun because of > > > SCHED_RESTART in dm-rq > > > > > > So the hw queue of dm-rq has been run in dm-rq's completion path > > > already, right? Why do we need to do it again in the hot path? > > > > The measurement data in the description of patch 5/5 shows a significant > > performance regression for an important workload, namely random I/O. > > Additionally, the performance improvement for sequential I/O was achieved > > for an unrealistically low queue depth. > > So you've ignored Ming's question entirely and instead decided to focus > on previous questions you raised to Ming that he ignored. This is > getting tedious. Sorry for not making it clear, I mentioned I will post a new version to address the random I/O regression. > > Especially given that Ming said the first patch that all this fighting > has been over isn't even required to attain the improvements. > > Ming, please retest both your baseline and patched setup with a > queue_depth of >= 32. Also, please do 3 - 5 runs to get a avg and std > dev across the runs. Taking a bigger queue_depth won't be helpful on this issue, and it can make the situation worse, because .cmd_per_lun won't be changed, and queue often becomes busy when number of in-flight requests is bigger than .cmd_per_lun. I will post one new version, which will use another simple way to figure out if underlying queue is busy, so that random I/O perf won't be affected, but this new version need to depend on the following patchset: https://marc.info/?t=150436555700002&r=1&w=2 So it may take a while since that patchset is still under review. I will post them all together in 'blk-mq-sched: improve SCSI-MQ performance(V5)'. The approach taken in patch 5 depends on q->queue_depth, but some SCSI host's .cmd_per_lun is different with q->queue_depth, so causes the random I/O regression. -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 16:49 ` Bart Van Assche 2017-09-19 16:55 ` Ming Lei @ 2017-09-20 1:19 ` Ming Lei 1 sibling, 0 replies; 20+ messages in thread From: Ming Lei @ 2017-09-20 1:19 UTC (permalink / raw) To: Bart Van Assche Cc: snitzer@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19, 2017 at 04:49:15PM +0000, Bart Van Assche wrote: > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART > > to do that already. > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to > reexamine the software queues and the hctx dispatch list but not the requeue > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the > requeue list. Hence the following code in scsi_end_request(): > > if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list)) > kblockd_schedule_work(&sdev->requeue_work); > else > blk_mq_run_hw_queues(q, true); Let's focus on dm-rq. I have explained before, dm-rq is different with SCSI's, we don't need to requeue request any more in dm-rq if blk_get_request() returns NULL in multipath_clone_and_map(), since SCHED_RESTART can cover that definitely. Not mention dm_mq_delay_requeue_request() will run the queue explicitly if it has to be done. That needn't SCHED_RESTART to cover, right? -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 5:43 ` Ming Lei 2017-09-19 15:36 ` Bart Van Assche @ 2017-09-19 15:48 ` Mike Snitzer 2017-09-19 15:52 ` Bart Van Assche 2017-09-19 16:07 ` Ming Lei 1 sibling, 2 replies; 20+ messages in thread From: Mike Snitzer @ 2017-09-19 15:48 UTC (permalink / raw) To: Ming Lei Cc: Bart Van Assche, hch@infradead.org, jejb@linux.vnet.ibm.com, sagi@grimberg.me, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, dm-devel@redhat.com, martin.petersen@oracle.com, loberman@redhat.com On Tue, Sep 19 2017 at 1:43am -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > > > "if no request has completed before the delay has expired" can't be a > > > reason to rerun the queue, because the queue can still be busy. > > > > That statement of you shows that there are important aspects of the SCSI > > core and dm-mpath driver that you don't understand. > > Then can you tell me why blk-mq's SCHED_RESTART can't cover > the rerun when there are in-flight requests? What is the case > in which dm-rq can return BUSY and there aren't any in-flight > requests meantime? > > Also you are the author of adding 'blk_mq_delay_run_hw_queue( > hctx, 100/*ms*/)' in dm-rq, you never explain in commit > 6077c2d706097c0(dm rq: Avoid that request processing stalls > sporadically) what the root cause is for your request stall > and why this patch fixes your issue. Even you don't explain > why is the delay 100ms? > > So it is a workaound, isn't it? > > My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) > in the hot path since it should been covered by SCHED_RESTART > if there are in-flight requests. This thread proves that it is definitely brittle to be relying on fixed delays like this: https://patchwork.kernel.org/patch/9703249/ Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 15:48 ` Mike Snitzer @ 2017-09-19 15:52 ` Bart Van Assche 2017-09-19 16:03 ` Mike Snitzer 2017-09-19 16:07 ` Ming Lei 1 sibling, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2017-09-19 15:52 UTC (permalink / raw) To: snitzer@redhat.com, ming.lei@redhat.com Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote: > This thread proves that it is definitely brittle to be relying on fixed > delays like this: > https://patchwork.kernel.org/patch/9703249/ Hello Mike, Sorry but I think that's a misinterpretation of my patch. I came up with that patch before I had found and fixed the root cause of the high system load. These fixes are upstream (kernel v4.13) which means that that patch is now obsolete. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 15:52 ` Bart Van Assche @ 2017-09-19 16:03 ` Mike Snitzer 0 siblings, 0 replies; 20+ messages in thread From: Mike Snitzer @ 2017-09-19 16:03 UTC (permalink / raw) To: Bart Van Assche Cc: ming.lei@redhat.com, linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com On Tue, Sep 19 2017 at 11:52am -0400, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote: > > This thread proves that it is definitely brittle to be relying on fixed > > delays like this: > > https://patchwork.kernel.org/patch/9703249/ > > Hello Mike, > > Sorry but I think that's a misinterpretation of my patch. I came up with that > patch before I had found and fixed the root cause of the high system load. > These fixes are upstream (kernel v4.13) which means that that patch is now > obsolete. OK, fair point. Though fixed magic values for delay aren't going to stand the test of time. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE 2017-09-19 15:48 ` Mike Snitzer 2017-09-19 15:52 ` Bart Van Assche @ 2017-09-19 16:07 ` Ming Lei 1 sibling, 0 replies; 20+ messages in thread From: Ming Lei @ 2017-09-19 16:07 UTC (permalink / raw) To: Mike Snitzer Cc: Bart Van Assche, hch@infradead.org, jejb@linux.vnet.ibm.com, sagi@grimberg.me, linux-scsi@vger.kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, dm-devel@redhat.com, martin.petersen@oracle.com, loberman@redhat.com On Tue, Sep 19, 2017 at 11:48:23AM -0400, Mike Snitzer wrote: > On Tue, Sep 19 2017 at 1:43am -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote: > > > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > > > > "if no request has completed before the delay has expired" can't be a > > > > reason to rerun the queue, because the queue can still be busy. > > > > > > That statement of you shows that there are important aspects of the SCSI > > > core and dm-mpath driver that you don't understand. > > > > Then can you tell me why blk-mq's SCHED_RESTART can't cover > > the rerun when there are in-flight requests? What is the case > > in which dm-rq can return BUSY and there aren't any in-flight > > requests meantime? > > > > Also you are the author of adding 'blk_mq_delay_run_hw_queue( > > hctx, 100/*ms*/)' in dm-rq, you never explain in commit > > 6077c2d706097c0(dm rq: Avoid that request processing stalls > > sporadically) what the root cause is for your request stall > > and why this patch fixes your issue. Even you don't explain > > why is the delay 100ms? > > > > So it is a workaound, isn't it? > > > > My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) > > in the hot path since it should been covered by SCHED_RESTART > > if there are in-flight requests. > > This thread proves that it is definitely brittle to be relying on fixed > delays like this: > https://patchwork.kernel.org/patch/9703249/ I can't agree more, because no one mentioned the root cause, maybe the request stall has been fixed recently. Keeping the workaound in hotpath is a bit annoying. -- Ming ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-09-20 1:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170915164456.9803-1-ming.lei@redhat.com>
2017-09-15 16:44 ` [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2017-09-15 17:57 ` Bart Van Assche
2017-09-17 12:40 ` Ming Lei
2017-09-18 15:18 ` Bart Van Assche
2017-09-19 5:43 ` Ming Lei
2017-09-19 15:36 ` Bart Van Assche
2017-09-19 15:56 ` Mike Snitzer
2017-09-19 16:04 ` Ming Lei
2017-09-19 16:49 ` Bart Van Assche
2017-09-19 16:55 ` Ming Lei
2017-09-19 18:42 ` Bart Van Assche
2017-09-19 22:44 ` Ming Lei
2017-09-19 23:25 ` Bart Van Assche
2017-09-19 23:50 ` Mike Snitzer
2017-09-20 1:13 ` Ming Lei
2017-09-20 1:19 ` Ming Lei
2017-09-19 15:48 ` Mike Snitzer
2017-09-19 15:52 ` Bart Van Assche
2017-09-19 16:03 ` Mike Snitzer
2017-09-19 16:07 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox