* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
[not found] <20170915164456.9803-1-ming.lei@redhat.com>
@ 2017-09-15 16:44 ` Ming Lei
2017-09-15 17:57 ` Bart Van Assche
0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
To: dm-devel, Mike Snitzer, Jens Axboe
Cc: linux-block, Christoph Hellwig, Bart Van Assche, Laurence Oberman,
Ming Lei, Sagi Grimberg, linux-nvme, James E.J. Bottomley,
Martin K. Petersen, linux-scsi
If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
the queue in the three situations:
1) if BLK_MQ_S_SCHED_RESTART is set
- queue is rerun after one rq is completed, see blk_mq_sched_restart()
which is run from blk_mq_free_request()
2) BLK_MQ_S_TAG_WAITING is set
- queue is rerun after one tag is freed
3) otherwise
- queue is run immediately in blk_mq_dispatch_rq_list()
So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
sense because no matter it is called or not, the queue still will be
rerun soon in above three situations, and the busy req can be dispatched
again.
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/md/dm-rq.c | 1 -
drivers/nvme/host/fc.c | 3 ---
drivers/scsi/scsi_lib.c | 4 ----
3 files changed, 8 deletions(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index c6ebc5b1e00e..71422cea1c4a 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -761,7 +761,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
/* Undo dm_start_request() before requeuing */
rq_end_stats(md, rq);
rq_completed(md, rq_data_dir(rq), false);
- blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
return BLK_STS_RESOURCE;
}
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index d2e882c0f496..17727eee5d5f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2041,9 +2041,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue,
return BLK_STS_OK;
busy:
- if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx)
- blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY);
-
return BLK_STS_RESOURCE;
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..71d5bf1345c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2010,11 +2010,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
out:
switch (ret) {
case BLK_STS_OK:
- break;
case BLK_STS_RESOURCE:
- if (atomic_read(&sdev->device_busy) == 0 &&
- !scsi_device_blocked(sdev))
- blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
break;
default:
/*
--
2.9.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-15 16:44 ` [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
@ 2017-09-15 17:57 ` Bart Van Assche
2017-09-17 12:40 ` Ming Lei
0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:57 UTC (permalink / raw)
To: dm-devel@redhat.com, axboe@fb.com, ming.lei@redhat.com,
snitzer@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
Bart Van Assche, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com
On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> the queue in the three situations:
>
> 1) if BLK_MQ_S_SCHED_RESTART is set
> - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> which is run from blk_mq_free_request()
>
> 2) BLK_MQ_S_TAG_WAITING is set
> - queue is rerun after one tag is freed
>
> 3) otherwise
> - queue is run immediately in blk_mq_dispatch_rq_list()
>
> So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> sense because no matter it is called or not, the queue still will be
> rerun soon in above three situations, and the busy req can be dispatched
> again.
NAK
Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
has to be rerun later even if no request has completed before the delay has
expired. This patch breaks at least the SCSI core and the dm-mpath drivers.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-15 17:57 ` Bart Van Assche
@ 2017-09-17 12:40 ` Ming Lei
2017-09-18 15:18 ` Bart Van Assche
0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-09-17 12:40 UTC (permalink / raw)
To: Bart Van Assche
Cc: dm-devel@redhat.com, axboe@fb.com, snitzer@redhat.com,
linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com
On Fri, Sep 15, 2017 at 05:57:31PM +0000, Bart Van Assche wrote:
> On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> > the queue in the three situations:
> >
> > 1) if BLK_MQ_S_SCHED_RESTART is set
> > - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> > which is run from blk_mq_free_request()
> >
> > 2) BLK_MQ_S_TAG_WAITING is set
> > - queue is rerun after one tag is freed
> >
> > 3) otherwise
> > - queue is run immediately in blk_mq_dispatch_rq_list()
> >
> > So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> > sense because no matter it is called or not, the queue still will be
> > rerun soon in above three situations, and the busy req can be dispatched
> > again.
>
> NAK
>
> Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
> has to be rerun later even if no request has completed before the delay has
> expired. This patch breaks at least the SCSI core and the dm-mpath drivers.
"if no request has completed before the delay has expired" can't be a
reason to rerun the queue, because the queue can still be busy.
The only reason is that there isn't any requests in-flight and
queue is still busy, but I'd rather understand what the exact
situation is, instead of using this kind of workaround. If no
such situation, we should remove that.
For SCSI, it might be reasonable to run the hw queue after
a random time when atomic_read(&sdev->device_busy) is zero,
that means the queue may be busy even when there isn't any
requests in-flight in this queue. Could you or someone share
what the case is? Then we can avoid to use this workaround.
For dm-mpath, it might be related with path, but I have to say
it is still a workaround.
I suggest to understand the root cause, instead of keeping this
ugly random delay because run hw queue after 100ms may be useless
in 99.99% times.
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-17 12:40 ` Ming Lei
@ 2017-09-18 15:18 ` Bart Van Assche
2017-09-19 5:43 ` Ming Lei
0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-18 15:18 UTC (permalink / raw)
To: ming.lei@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
snitzer@redhat.com, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> "if no request has completed before the delay has expired" can't be a
> reason to rerun the queue, because the queue can still be busy.
That statement of you shows that there are important aspects of the SCSI
core and dm-mpath driver that you don't understand.
> I suggest to understand the root cause, instead of keeping this
> ugly random delay because run hw queue after 100ms may be useless
> in 99.99% times.
If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
then I think you are looking in the wrong direction. What kind of problem
are you trying to solve? Is it perhaps that there can be a delay between
dm-mpath request completion and the queueing of a new request? If so,
adding a queue run call into the dm-mpath end_io callback is probably
sufficient and probably can replace this entire patch series.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-18 15:18 ` Bart Van Assche
@ 2017-09-19 5:43 ` Ming Lei
2017-09-19 15:36 ` Bart Van Assche
2017-09-19 15:48 ` Mike Snitzer
0 siblings, 2 replies; 20+ messages in thread
From: Ming Lei @ 2017-09-19 5:43 UTC (permalink / raw)
To: Bart Van Assche
Cc: hch@infradead.org, jejb@linux.vnet.ibm.com, sagi@grimberg.me,
linux-scsi@vger.kernel.org, axboe@fb.com, snitzer@redhat.com,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
dm-devel@redhat.com, martin.petersen@oracle.com,
loberman@redhat.com
On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > "if no request has completed before the delay has expired" can't be a
> > reason to rerun the queue, because the queue can still be busy.
>
> That statement of you shows that there are important aspects of the SCSI
> core and dm-mpath driver that you don't understand.
Then can you tell me why blk-mq's SCHED_RESTART can't cover
the rerun when there are in-flight requests? What is the case
in which dm-rq can return BUSY and there aren't any in-flight
requests meantime?
Also you are the author of adding 'blk_mq_delay_run_hw_queue(
hctx, 100/*ms*/)' in dm-rq, you never explain in commit
6077c2d706097c0(dm rq: Avoid that request processing stalls
sporadically) what the root cause is for your request stall
and why this patch fixes your issue. Even you don't explain
why is the delay 100ms?
So it is a workaound, isn't it?
My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
in the hot path since it should been covered by SCHED_RESTART
if there are in-flight requests.
>
> > I suggest to understand the root cause, instead of keeping this
> > ugly random delay because run hw queue after 100ms may be useless
> > in 99.99% times.
>
> If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> then I think you are looking in the wrong direction. What kind of problem
> are you trying to solve? Is it perhaps that there can be a delay between
Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
need this patch.
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 5:43 ` Ming Lei
@ 2017-09-19 15:36 ` Bart Van Assche
2017-09-19 15:56 ` Mike Snitzer
2017-09-19 15:48 ` Mike Snitzer
1 sibling, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:36 UTC (permalink / raw)
To: ming.lei@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
snitzer@redhat.com, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > then I think you are looking in the wrong direction. What kind of problem
> > are you trying to solve? Is it perhaps that there can be a delay between
>
> Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> need this patch.
The approach of this patch series looks wrong to me and patch 5/5 is an ugly
hack in my opinion. That's why I asked you to drop the entire patch series and
to test whether inserting a queue run call into the dm-mpath end_io callback
yields a similar performance improvement to the patches you posted. Please do
not expect me to spend more time on this patch series if you do not come up
with measurement results for the proposed alternative.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 5:43 ` Ming Lei
2017-09-19 15:36 ` Bart Van Assche
@ 2017-09-19 15:48 ` Mike Snitzer
2017-09-19 15:52 ` Bart Van Assche
2017-09-19 16:07 ` Ming Lei
1 sibling, 2 replies; 20+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:48 UTC (permalink / raw)
To: Ming Lei
Cc: Bart Van Assche, hch@infradead.org, jejb@linux.vnet.ibm.com,
sagi@grimberg.me, linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
dm-devel@redhat.com, martin.petersen@oracle.com,
loberman@redhat.com
On Tue, Sep 19 2017 at 1:43am -0400,
Ming Lei <ming.lei@redhat.com> wrote:
> On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > > "if no request has completed before the delay has expired" can't be a
> > > reason to rerun the queue, because the queue can still be busy.
> >
> > That statement of you shows that there are important aspects of the SCSI
> > core and dm-mpath driver that you don't understand.
>
> Then can you tell me why blk-mq's SCHED_RESTART can't cover
> the rerun when there are in-flight requests? What is the case
> in which dm-rq can return BUSY and there aren't any in-flight
> requests meantime?
>
> Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> 6077c2d706097c0(dm rq: Avoid that request processing stalls
> sporadically) what the root cause is for your request stall
> and why this patch fixes your issue. Even you don't explain
> why is the delay 100ms?
>
> So it is a workaound, isn't it?
>
> My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> in the hot path since it should been covered by SCHED_RESTART
> if there are in-flight requests.
This thread proves that it is definitely brittle to be relying on fixed
delays like this:
https://patchwork.kernel.org/patch/9703249/
Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 15:48 ` Mike Snitzer
@ 2017-09-19 15:52 ` Bart Van Assche
2017-09-19 16:03 ` Mike Snitzer
2017-09-19 16:07 ` Ming Lei
1 sibling, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:52 UTC (permalink / raw)
To: snitzer@redhat.com, ming.lei@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote:
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/
Hello Mike,
Sorry but I think that's a misinterpretation of my patch. I came up with that
patch before I had found and fixed the root cause of the high system load.
These fixes are upstream (kernel v4.13) which means that that patch is now
obsolete.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 15:36 ` Bart Van Assche
@ 2017-09-19 15:56 ` Mike Snitzer
2017-09-19 16:04 ` Ming Lei
0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:56 UTC (permalink / raw)
To: Bart Van Assche
Cc: ming.lei@redhat.com, linux-block@vger.kernel.org,
hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19 2017 at 11:36am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > then I think you are looking in the wrong direction. What kind of problem
> > > are you trying to solve? Is it perhaps that there can be a delay between
> >
> > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > need this patch.
>
> The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> hack in my opinion. That's why I asked you to drop the entire patch series and
> to test whether inserting a queue run call into the dm-mpath end_io callback
> yields a similar performance improvement to the patches you posted. Please do
> not expect me to spend more time on this patch series if you do not come up
> with measurement results for the proposed alternative.
Bart, asserting that Ming's work is a hack doesn't help your apparent
goal of deligitimizing Ming's work.
Nor does it take away from the fact that your indecision on appropriate
timeouts (let alone ability to defend and/or explain them) validates
Ming calling them into question (which you are now dodging regularly).
But please don't take this feedback and shut-down. Instead please work
together more constructively. This doesn't need to be adversarial! I
am at a loss for why there is such animosity from your end Bart.
Please dial it back. It is just a distraction that fosters needless
in-fighting.
Believe it or not: Ming is just trying to improve the code because he
has a testbed that is showing fairly abysmal performance with dm-mq
multipath (on lpfc with scsi-mq).
Ming, if you can: please see if what Bart has proposed (instead: run
queue at end_io) helps. Though I don't yet see why that should be
needed.
Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 15:52 ` Bart Van Assche
@ 2017-09-19 16:03 ` Mike Snitzer
0 siblings, 0 replies; 20+ messages in thread
From: Mike Snitzer @ 2017-09-19 16:03 UTC (permalink / raw)
To: Bart Van Assche
Cc: ming.lei@redhat.com, linux-block@vger.kernel.org,
hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19 2017 at 11:52am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote:
> > This thread proves that it is definitely brittle to be relying on fixed
> > delays like this:
> > https://patchwork.kernel.org/patch/9703249/
>
> Hello Mike,
>
> Sorry but I think that's a misinterpretation of my patch. I came up with that
> patch before I had found and fixed the root cause of the high system load.
> These fixes are upstream (kernel v4.13) which means that that patch is now
> obsolete.
OK, fair point.
Though fixed magic values for delay aren't going to stand the test of
time.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 15:56 ` Mike Snitzer
@ 2017-09-19 16:04 ` Ming Lei
2017-09-19 16:49 ` Bart Van Assche
0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-09-19 16:04 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, linux-block@vger.kernel.org, hch@infradead.org,
sagi@grimberg.me, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19, 2017 at 11:56:03AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at 11:36am -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>
> > On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> > > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > > then I think you are looking in the wrong direction. What kind of problem
> > > > are you trying to solve? Is it perhaps that there can be a delay between
> > >
> > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > > need this patch.
> >
> > The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> > hack in my opinion. That's why I asked you to drop the entire patch series and
> > to test whether inserting a queue run call into the dm-mpath end_io callback
> > yields a similar performance improvement to the patches you posted. Please do
> > not expect me to spend more time on this patch series if you do not come up
> > with measurement results for the proposed alternative.
>
> Bart, asserting that Ming's work is a hack doesn't help your apparent
> goal of deligitimizing Ming's work.
>
> Nor does it take away from the fact that your indecision on appropriate
> timeouts (let alone ability to defend and/or explain them) validates
> Ming calling them into question (which you are now dodging regularly).
>
> But please don't take this feedback and shut-down. Instead please work
> together more constructively. This doesn't need to be adversarial! I
> am at a loss for why there is such animosity from your end Bart.
>
> Please dial it back. It is just a distraction that fosters needless
> in-fighting.
>
> Believe it or not: Ming is just trying to improve the code because he
> has a testbed that is showing fairly abysmal performance with dm-mq
> multipath (on lpfc with scsi-mq).
>
> Ming, if you can: please see if what Bart has proposed (instead: run
> queue at end_io) helps. Though I don't yet see why that should be
> needed.
Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
to do that already.
The dm-mpath performance issue is nothing to do with that, actually
the issue is very similar with my improvement on SCSI-MQ, because
now dm_dispatch_clone_request() doesn't know if the underlying
queue is busy or not, and dm-rq/mpath is just trying to dispatch
more request to underlying queue even though it is busy, then IO
merge can't be done easily on dm-rq/mpath.
I am thinking another way to improve this issue, since some
SCSI device's queue_depth is different with .cmd_per_lun,
will post patchset soon.
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 15:48 ` Mike Snitzer
2017-09-19 15:52 ` Bart Van Assche
@ 2017-09-19 16:07 ` Ming Lei
1 sibling, 0 replies; 20+ messages in thread
From: Ming Lei @ 2017-09-19 16:07 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, hch@infradead.org, jejb@linux.vnet.ibm.com,
sagi@grimberg.me, linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
dm-devel@redhat.com, martin.petersen@oracle.com,
loberman@redhat.com
On Tue, Sep 19, 2017 at 11:48:23AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at 1:43am -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
>
> > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > > > "if no request has completed before the delay has expired" can't be a
> > > > reason to rerun the queue, because the queue can still be busy.
> > >
> > > That statement of you shows that there are important aspects of the SCSI
> > > core and dm-mpath driver that you don't understand.
> >
> > Then can you tell me why blk-mq's SCHED_RESTART can't cover
> > the rerun when there are in-flight requests? What is the case
> > in which dm-rq can return BUSY and there aren't any in-flight
> > requests meantime?
> >
> > Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> > hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> > 6077c2d706097c0(dm rq: Avoid that request processing stalls
> > sporadically) what the root cause is for your request stall
> > and why this patch fixes your issue. Even you don't explain
> > why is the delay 100ms?
> >
> > So it is a workaound, isn't it?
> >
> > My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> > in the hot path since it should been covered by SCHED_RESTART
> > if there are in-flight requests.
>
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/
I can't agree more, because no one mentioned the root cause, maybe the
request stall has been fixed recently. Keeping the workaound in hotpath
is a bit annoying.
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 16:04 ` Ming Lei
@ 2017-09-19 16:49 ` Bart Van Assche
2017-09-19 16:55 ` Ming Lei
2017-09-20 1:19 ` Ming Lei
0 siblings, 2 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-09-19 16:49 UTC (permalink / raw)
To: ming.lei@redhat.com, snitzer@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> to do that already.
Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
reexamine the software queues and the hctx dispatch list but not the requeue
list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
requeue list. Hence the following code in scsi_end_request():
if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
kblockd_schedule_work(&sdev->requeue_work);
else
blk_mq_run_hw_queues(q, true);
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 16:49 ` Bart Van Assche
@ 2017-09-19 16:55 ` Ming Lei
2017-09-19 18:42 ` Bart Van Assche
2017-09-20 1:19 ` Ming Lei
1 sibling, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-09-19 16:55 UTC (permalink / raw)
To: Bart Van Assche
Cc: ming.lei@redhat.com, snitzer@redhat.com,
linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
<Bart.VanAssche@wdc.com> wrote:
> On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
>> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
>> to do that already.
>
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():
That doesn't need SCHED_RESTART, because it is requeue's
responsibility to do that,
see blk_mq_requeue_work(), which will run hw queue at the end of this func.
--
Ming Lei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 16:55 ` Ming Lei
@ 2017-09-19 18:42 ` Bart Van Assche
2017-09-19 22:44 ` Ming Lei
0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-19 18:42 UTC (permalink / raw)
To: tom.leiming@gmail.com
Cc: linux-block@vger.kernel.org, hch@infradead.org, sagi@grimberg.me,
snitzer@redhat.com, martin.petersen@oracle.com,
ming.lei@redhat.com, linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote:
> On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> <Bart.VanAssche@wdc.com> wrote:
> > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > to do that already.
> >
> > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > reexamine the software queues and the hctx dispatch list but not the requeue
> > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > requeue list. Hence the following code in scsi_end_request():
>
> That doesn't need SCHED_RESTART, because it is requeue's
> responsibility to do that,
> see blk_mq_requeue_work(), which will run hw queue at the end of this func.
That's not what I was trying to explain. What I was trying to explain is that
every block driver that can cause a request to end up on the requeue list is
responsible for kicking the requeue list at a later time. Hence the
kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
dm code. What I would like to see is measurement results for dm-mpath without
this patch series and a call to kick the requeue list added to the dm-mpath
end_io code.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 18:42 ` Bart Van Assche
@ 2017-09-19 22:44 ` Ming Lei
2017-09-19 23:25 ` Bart Van Assche
0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-09-19 22:44 UTC (permalink / raw)
To: Bart Van Assche
Cc: tom.leiming@gmail.com, linux-block@vger.kernel.org,
hch@infradead.org, sagi@grimberg.me, snitzer@redhat.com,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19, 2017 at 06:42:30PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote:
> > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> > <Bart.VanAssche@wdc.com> wrote:
> > > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > > to do that already.
> > >
> > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > > reexamine the software queues and the hctx dispatch list but not the requeue
> > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > > requeue list. Hence the following code in scsi_end_request():
> >
> > That doesn't need SCHED_RESTART, because it is requeue's
> > responsibility to do that,
> > see blk_mq_requeue_work(), which will run hw queue at the end of this func.
>
> That's not what I was trying to explain. What I was trying to explain is that
> every block driver that can cause a request to end up on the requeue list is
> responsible for kicking the requeue list at a later time. Hence the
> kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
> blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
> dm code. What I would like to see is measurement results for dm-mpath without
> this patch series and a call to kick the requeue list added to the dm-mpath
> end_io code.
For this issue, it isn't same between SCSI and dm-rq.
We don't need to run queue in .end_io of dm, and the theory is
simple, otherwise it isn't performance issue, and should be I/O hang.
1) every dm-rq's request is 1:1 mapped to SCSI's request
2) if there is any mapped SCSI request not finished, either
in-flight or in requeue list or whatever, there will be one
corresponding dm-rq's request in-flight
3) once the mapped SCSI request is completed, dm-rq's completion
path will be triggered and dm-rq's queue will be rerun because of
SCHED_RESTART in dm-rq
So the hw queue of dm-rq has been run in dm-rq's completion path
already, right? Why do we need to do it again in the hot path?
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 22:44 ` Ming Lei
@ 2017-09-19 23:25 ` Bart Van Assche
2017-09-19 23:50 ` Mike Snitzer
0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-09-19 23:25 UTC (permalink / raw)
To: ming.lei@redhat.com
Cc: linux-block@vger.kernel.org, hch@infradead.org,
tom.leiming@gmail.com, sagi@grimberg.me, snitzer@redhat.com,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> For this issue, it isn't same between SCSI and dm-rq.
>
> We don't need to run queue in .end_io of dm, and the theory is
> simple, otherwise it isn't performance issue, and should be I/O hang.
>
> 1) every dm-rq's request is 1:1 mapped to SCSI's request
>
> 2) if there is any mapped SCSI request not finished, either
> in-flight or in requeue list or whatever, there will be one
> corresponding dm-rq's request in-flight
>
> 3) once the mapped SCSI request is completed, dm-rq's completion
> path will be triggered and dm-rq's queue will be rerun because of
> SCHED_RESTART in dm-rq
>
> So the hw queue of dm-rq has been run in dm-rq's completion path
> already, right? Why do we need to do it again in the hot path?
The measurement data in the description of patch 5/5 shows a significant
performance regression for an important workload, namely random I/O.
Additionally, the performance improvement for sequential I/O was achieved
for an unrealistically low queue depth. Sorry but given these measurement
results I don't see why I should spend more time on this patch series.
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 23:25 ` Bart Van Assche
@ 2017-09-19 23:50 ` Mike Snitzer
2017-09-20 1:13 ` Ming Lei
0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2017-09-19 23:50 UTC (permalink / raw)
To: Bart Van Assche
Cc: ming.lei@redhat.com, linux-block@vger.kernel.org,
hch@infradead.org, tom.leiming@gmail.com, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19 2017 at 7:25pm -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> > For this issue, it isn't same between SCSI and dm-rq.
> >
> > We don't need to run queue in .end_io of dm, and the theory is
> > simple, otherwise it isn't performance issue, and should be I/O hang.
> >
> > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> >
> > 2) if there is any mapped SCSI request not finished, either
> > in-flight or in requeue list or whatever, there will be one
> > corresponding dm-rq's request in-flight
> >
> > 3) once the mapped SCSI request is completed, dm-rq's completion
> > path will be triggered and dm-rq's queue will be rerun because of
> > SCHED_RESTART in dm-rq
> >
> > So the hw queue of dm-rq has been run in dm-rq's completion path
> > already, right? Why do we need to do it again in the hot path?
>
> The measurement data in the description of patch 5/5 shows a significant
> performance regression for an important workload, namely random I/O.
> Additionally, the performance improvement for sequential I/O was achieved
> for an unrealistically low queue depth.
So you've ignored Ming's question entirely and instead decided to focus
on previous questions you raised to Ming that he ignored. This is
getting tedious.
Especially given that Ming said the first patch that all this fighting
has been over isn't even required to attain the improvements.
Ming, please retest both your baseline and patched setup with a
queue_depth of >= 32. Also, please do 3 - 5 runs to get a avg and std
dev across the runs.
> Sorry but given these measurement results I don't see why I should
> spend more time on this patch series.
Bart, I've historically genuinely always appreciated your review and
insight. But if your future "review" on this patchset would take the
form shown in this thread then please don't spend more time on it.
Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 23:50 ` Mike Snitzer
@ 2017-09-20 1:13 ` Ming Lei
0 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2017-09-20 1:13 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, linux-block@vger.kernel.org, hch@infradead.org,
tom.leiming@gmail.com, sagi@grimberg.me,
martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
axboe@fb.com, linux-nvme@lists.infradead.org,
jejb@linux.vnet.ibm.com, loberman@redhat.com, dm-devel@redhat.com
Hi Mike,
On Tue, Sep 19, 2017 at 07:50:06PM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at 7:25pm -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>
> > On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> > > For this issue, it isn't same between SCSI and dm-rq.
> > >
> > > We don't need to run queue in .end_io of dm, and the theory is
> > > simple, otherwise it isn't performance issue, and should be I/O hang.
> > >
> > > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> > >
> > > 2) if there is any mapped SCSI request not finished, either
> > > in-flight or in requeue list or whatever, there will be one
> > > corresponding dm-rq's request in-flight
> > >
> > > 3) once the mapped SCSI request is completed, dm-rq's completion
> > > path will be triggered and dm-rq's queue will be rerun because of
> > > SCHED_RESTART in dm-rq
> > >
> > > So the hw queue of dm-rq has been run in dm-rq's completion path
> > > already, right? Why do we need to do it again in the hot path?
> >
> > The measurement data in the description of patch 5/5 shows a significant
> > performance regression for an important workload, namely random I/O.
> > Additionally, the performance improvement for sequential I/O was achieved
> > for an unrealistically low queue depth.
>
> So you've ignored Ming's question entirely and instead decided to focus
> on previous questions you raised to Ming that he ignored. This is
> getting tedious.
Sorry for not making it clear, I mentioned I will post a new version
to address the random I/O regression.
>
> Especially given that Ming said the first patch that all this fighting
> has been over isn't even required to attain the improvements.
>
> Ming, please retest both your baseline and patched setup with a
> queue_depth of >= 32. Also, please do 3 - 5 runs to get a avg and std
> dev across the runs.
Taking a bigger queue_depth won't be helpful on this issue,
and it can make the situation worse, because .cmd_per_lun won't
be changed, and queue often becomes busy when number of in-flight
requests is bigger than .cmd_per_lun.
I will post one new version, which will use another simple way to
figure out if underlying queue is busy, so that random I/O perf
won't be affected, but this new version need to depend on the
following patchset:
https://marc.info/?t=150436555700002&r=1&w=2
So it may take a while since that patchset is still under review.
I will post them all together in 'blk-mq-sched: improve SCSI-MQ performance(V5)'.
The approach taken in patch 5 depends on q->queue_depth, but some SCSI
host's .cmd_per_lun is different with q->queue_depth, so causes
the random I/O regression.
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
2017-09-19 16:49 ` Bart Van Assche
2017-09-19 16:55 ` Ming Lei
@ 2017-09-20 1:19 ` Ming Lei
1 sibling, 0 replies; 20+ messages in thread
From: Ming Lei @ 2017-09-20 1:19 UTC (permalink / raw)
To: Bart Van Assche
Cc: snitzer@redhat.com, linux-block@vger.kernel.org,
hch@infradead.org, sagi@grimberg.me, martin.petersen@oracle.com,
linux-scsi@vger.kernel.org, axboe@fb.com,
linux-nvme@lists.infradead.org, jejb@linux.vnet.ibm.com,
loberman@redhat.com, dm-devel@redhat.com
On Tue, Sep 19, 2017 at 04:49:15PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > to do that already.
>
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():
>
> if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
> kblockd_schedule_work(&sdev->requeue_work);
> else
> blk_mq_run_hw_queues(q, true);
Let's focus on dm-rq.
I have explained before, dm-rq is different with SCSI's, we don't need
to requeue request any more in dm-rq if blk_get_request() returns NULL
in multipath_clone_and_map(), since SCHED_RESTART can cover that
definitely.
Not mention dm_mq_delay_requeue_request() will run the queue explicitly
if it has to be done. That needn't SCHED_RESTART to cover, right?
--
Ming
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-09-20 1:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170915164456.9803-1-ming.lei@redhat.com>
2017-09-15 16:44 ` [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2017-09-15 17:57 ` Bart Van Assche
2017-09-17 12:40 ` Ming Lei
2017-09-18 15:18 ` Bart Van Assche
2017-09-19 5:43 ` Ming Lei
2017-09-19 15:36 ` Bart Van Assche
2017-09-19 15:56 ` Mike Snitzer
2017-09-19 16:04 ` Ming Lei
2017-09-19 16:49 ` Bart Van Assche
2017-09-19 16:55 ` Ming Lei
2017-09-19 18:42 ` Bart Van Assche
2017-09-19 22:44 ` Ming Lei
2017-09-19 23:25 ` Bart Van Assche
2017-09-19 23:50 ` Mike Snitzer
2017-09-20 1:13 ` Ming Lei
2017-09-20 1:19 ` Ming Lei
2017-09-19 15:48 ` Mike Snitzer
2017-09-19 15:52 ` Bart Van Assche
2017-09-19 16:03 ` Mike Snitzer
2017-09-19 16:07 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox