linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq
@ 2023-01-04 14:22 Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx Kemeng Shi
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Hi, this series contain several bugfix patches to fix potential io
hung and a few cleanup patches to remove stale codes and unnecessary
check. Most changes are in request issue and dispatch path. Thanks.

---
V2:
 -Thanks Christoph for review and there are two fixes in v2 according
to recommends from Christoph.
  1)Avoid overly long line in patch "blk-mq: avoid sleep in
blk_mq_alloc_request_hctx"
  2)Check BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED in two WARN_ON_ONCE
---

Kemeng Shi (13):
  blk-mq: avoid sleep in blk_mq_alloc_request_hctx
  blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
  blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
  blk-mq: Fix potential io hung for shared sbitmap per tagset
  blk-mq: remove unnecessary list_empty check in
    blk_mq_try_issue_list_directly
  blk-mq: remove unncessary error count and flush in
    blk_mq_plug_issue_direct
  blk-mq: remove error count and unncessary flush in
    blk_mq_try_issue_list_directly
  blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  blk-mq: remove unnecessary error count and check in
    blk_mq_dispatch_rq_list
  blk-mq: remove set of bd->last when get driver tag for next request
    fails
  blk-mq: remove unncessary from_schedule parameter in
    blk_mq_plug_issue_direct
  blk-mq: use switch/case to improve readability in
    blk_mq_try_issue_list_directly
  blk-mq: correct stale comment of .get_budget

 block/blk-mq-sched.c |   7 ++-
 block/blk-mq.c       | 105 +++++++++++++++++++------------------------
 2 files changed, 48 insertions(+), 64 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 17:55   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx Kemeng Shi
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
blk_mq_alloc_request_hctx to send commands to a specific queue. If
BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
must be set in flags for blk_mq_alloc_request_hctx.
After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
from unexpected hctx. I guess what we need here is that return -EINVAL
if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.

Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if
specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue
and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed
case in future.

Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e6b3ccd4989..42bb59fa275c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -657,7 +657,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	 * allocator for this for the rare use case of a command tied to
 	 * a specific queue.
 	 */
-	if (WARN_ON_ONCE(!(flags & (BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED))))
+	if (WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT)) ||
+	    WARN_ON_ONCE(!(flags & BLK_MQ_REQ_RESERVED)))
 		return ERR_PTR(-EINVAL);
 
 	if (hctx_idx >= q->nr_hw_queues)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 17:55   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait Kemeng Shi
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq-sched.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 23d1a90fec42..ae40cdb7a383 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -19,8 +19,7 @@
 #include "blk-wbt.h"
 
 /*
- * Mark a hardware queue as needing a restart. For shared queues, maintain
- * a count of how many hardware queues are marked for restart.
+ * Mark a hardware queue as needing a restart.
  */
 void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 17:55   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 04/13] blk-mq: Fix potential io hung for shared sbitmap per tagset Kemeng Shi
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

For shared queues case, we will only wait on bitmap_tags if we fail to get
driver tag. However, rq could be from breserved_tags, then two problems
will occur:
1. io hung if no tag is currently allocated from bitmap_tags.
2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is
freed to breserved_tags.
Wait on the bitmap from which rq from to fix this.

Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 42bb59fa275c..ec958aa044ba 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1820,7 +1820,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -1843,6 +1843,10 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	if (!list_empty_careful(&wait->entry))
 		return false;
 
+	if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag))
+		sbq = &hctx->tags->breserved_tags;
+	else
+		sbq = &hctx->tags->bitmap_tags;
 	wq = &bt_wait_ptr(sbq, hctx)->wait;
 
 	spin_lock_irq(&wq->lock);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/13] blk-mq: Fix potential io hung for shared sbitmap per tagset
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (2 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly Kemeng Shi
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared
tags") mark restart for unshared tags for improvement. At that time,
tags is only shared betweens queues and we can check if tags is shared
by test BLK_MQ_F_TAG_SHARED.
Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per
tagset") enabled tags share betweens hctxs inside a queue. We only
mark restart for shared hctxs inside a queue and may cause io hung if
there is no tag currently allocated by hctxs going to be marked restart.
Wait on sbitmap_queue instead of mark restart for shared hctxs case to
fix this.

Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ec958aa044ba..9c3a9b5e5b0c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1825,7 +1825,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
+	    !(blk_mq_is_shared_tags(hctx->flags))) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -2095,7 +2096,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 		bool needs_restart;
 		/* For non-shared tags, the RESTART check will suffice */
 		bool no_tag = prep == PREP_DISPATCH_NO_TAG &&
-			(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED);
+			((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
+			blk_mq_is_shared_tags(hctx->flags));
 
 		if (nr_budgets)
 			blk_mq_release_budgets(q, list);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (3 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 04/13] blk-mq: Fix potential io hung for shared sbitmap per tagset Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 17:56   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct Kemeng Shi
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also
count errors for 'BLK_STS_*RESOURCE' error. If list is not empty,
errors will always be non-zero. So we can remove unnecessary list_empty
check. This will remove redundant list_empty check for case that
error happened at sending last request in list.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c3a9b5e5b0c..d84ce1f758ce 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2833,8 +2833,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 	 * the driver there was more coming, but that turned out to
 	 * be a lie.
 	 */
-	if ((!list_empty(list) || errors) &&
-	     hctx->queue->mq_ops->commit_rqs && queued)
+	if (errors && hctx->queue->mq_ops->commit_rqs && queued)
 		hctx->queue->mq_ops->commit_rqs(hctx);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (4 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 18:02   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly Kemeng Shi
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

blk_mq_plug_issue_direct try to send a list of requests which belong to
different hctxs. Normally, we will send flush when hctx changes as there
maybe no more request for the same hctx. Besides we will send flush along
with last request in the list by set last parameter of
blk_mq_request_issue_directly.

Extra flush is needed for two cases:
1. We stop sending at middle of list, then normal flush sent after last
request of current hctx is miss.
2. Error happens at sending last request and normal flush may be lost.

In blk_mq_plug_issue_direct, we only break the list walk if we get
BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE error. We will send extra flush
for this case already.
We count error number and send extra flush if error number is non-zero
after sending all requests in list. This could cover case 2 described
above, but there are two things to improve:
1. If last request is sent successfully, error of request at middle of list
will trigger an unnecessary flush.
2. We only need error of last request instead of error number and error of
last request can be simply retrieved from ret.

Cover case 2 above by simply check ret of last request and remove
unnecessary error count and flush to improve blk_mq_plug_issue_direct.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d84ce1f758ce..ba917b6b5cc1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2687,11 +2687,10 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 	struct blk_mq_hw_ctx *hctx = NULL;
 	struct request *rq;
 	int queued = 0;
-	int errors = 0;
+	blk_status_t ret;
 
 	while ((rq = rq_list_pop(&plug->mq_list))) {
 		bool last = rq_list_empty(plug->mq_list);
-		blk_status_t ret;
 
 		if (hctx != rq->mq_hctx) {
 			if (hctx)
@@ -2711,7 +2710,6 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 			return;
 		default:
 			blk_mq_end_request(rq, ret);
-			errors++;
 			break;
 		}
 	}
@@ -2720,7 +2718,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 	 * If we didn't flush the entire list, we could have told the driver
 	 * there was more coming, but that turned out to be a lie.
 	 */
-	if (errors)
+	if (ret != BLK_STS_OK)
 		blk_mq_commit_rqs(hctx, &queued, from_schedule);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (5 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 18:03   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list Kemeng Shi
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

blk_mq_try_issue_list_directly try to send a list requests belong to the
same hctx to driver. Normally, we will send flush along with last request
in the list by set last parameter in blk_mq_request_issue_directly.
Extra flush is needed for two cases:
1. We stop sending at middle of list and normal flush along with last
request will not be sent.
2. Error happens at sending last request and normal flush may be lost.

We will only break list walk if we get BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE which will be stored in ret. So for case 1, we can
simply check ret and send a extra flush if ret is not BLK_STS_OK.
For case 2, the error of last request in the list is also stored in ret, we
can simply check ret and send a extra flush if ret is not BLK_STS_OK too.

Then error count is not needed and error in middle of list will not trigger
unnecessary extra flush anymore.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ba917b6b5cc1..a9e88037550b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2804,17 +2804,15 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list)
 {
 	int queued = 0;
-	int errors = 0;
+	blk_status_t ret;
 
 	while (!list_empty(list)) {
-		blk_status_t ret;
 		struct request *rq = list_first_entry(list, struct request,
 				queuelist);
 
 		list_del_init(&rq->queuelist);
 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
 		if (ret != BLK_STS_OK) {
-			errors++;
 			if (ret == BLK_STS_RESOURCE ||
 					ret == BLK_STS_DEV_RESOURCE) {
 				blk_mq_request_bypass_insert(rq, false,
@@ -2831,7 +2829,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 	 * the driver there was more coming, but that turned out to
 	 * be a lie.
 	 */
-	if (errors && hctx->queue->mq_ops->commit_rqs && queued)
+	if (ret != BLK_STS_OK && hctx->queue->mq_ops->commit_rqs && queued)
 		hctx->queue->mq_ops->commit_rqs(hctx);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (6 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 18:06   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 09/13] blk-mq: remove unnecessary error count and " Kemeng Shi
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

For busy error BLK_STS*_RESOURCE, request will always be added
back to list, so need_resource will not be true and ret will
not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove
these dead check.
If list is empty, we only need to send extra flush
if error happens at last request in the list which is stored in
ret. So send a extra flush if ret is not BLK_STS_OK instead of
errors is non-zero to avoid unnecessary flush for error at middle
request in list.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a9e88037550b..c543c14fdb47 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2085,8 +2085,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	/* If we didn't flush the entire list, we could have told the driver
 	 * there was more coming, but that turned out to be a lie.
 	 */
-	if ((!list_empty(list) || errors || needs_resource ||
-	     ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
+	if ((!list_empty(list) || ret != BLK_STS_OK) &&
+	     q->mq_ops->commit_rqs && queued)
 		q->mq_ops->commit_rqs(hctx);
 	/*
 	 * Any items that need requeuing? Stuff them into hctx->dispatch,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/13] blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_list
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (7 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 10/13] blk-mq: remove set of bd->last when get driver tag for next request fails Kemeng Shi
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will
return true if we are not busy and can handle more and return false on the
opposite. Inside blk_mq_dispatch_rq_list, we will return true if list is
empty and (errors + queued) != 0.

For busy error BLK_STS*_RESOURCE, the failed request will be added back
to list and list will not be empty. We count queued for BLK_STS_OK and
errors for rest error except busy error.
So if list is empty, (errors + queued) will be total requests in the list
which is checked not empty at beginning of blk_mq_dispatch_rq_list. So
(errors + queued) != 0 is always met if all requests are handled. Then the
(errors + queued) != 0 check and errors number count is not needed.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c543c14fdb47..c1d4d899f059 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2010,7 +2010,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	enum prep_dispatch prep;
 	struct request_queue *q = hctx->queue;
 	struct request *rq, *nxt;
-	int errors, queued;
+	int queued;
 	blk_status_t ret = BLK_STS_OK;
 	LIST_HEAD(zone_list);
 	bool needs_resource = false;
@@ -2021,7 +2021,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	/*
 	 * Now process all the entries, sending them to the driver.
 	 */
-	errors = queued = 0;
+	queued = 0;
 	do {
 		struct blk_mq_queue_data bd;
 
@@ -2074,7 +2074,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 			needs_resource = true;
 			break;
 		default:
-			errors++;
 			blk_mq_end_request(rq, ret);
 		}
 	} while (!list_empty(list));
@@ -2152,10 +2151,10 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 
 		blk_mq_update_dispatch_busy(hctx, true);
 		return false;
-	} else
-		blk_mq_update_dispatch_busy(hctx, false);
+	}
 
-	return (queued + errors) != 0;
+	blk_mq_update_dispatch_busy(hctx, false);
+	return true;
 }
 
 /**
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/13] blk-mq: remove set of bd->last when get driver tag for next request fails
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (8 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 09/13] blk-mq: remove unnecessary error count and " Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct Kemeng Shi
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
correctly") will set last if we failed to get driver tag for next
request to avoid flush miss as we break the list walk and will not
send the last request in the list which will be sent with last set
normally.
This code seems stale now becase the flush introduced is always
redundant as:
For case tag is really out, we will send a extra flush if we find
list is not empty after list walk.
For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
next, then we can get a tag for next request in retry and flush notified
already is not necessary.

Just remove these stale codes.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 24 ++----------------------
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c1d4d899f059..882c03a3f0aa 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1917,16 +1917,6 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
 static void blk_mq_handle_dev_resource(struct request *rq,
 				       struct list_head *list)
 {
-	struct request *next =
-		list_first_entry_or_null(list, struct request, queuelist);
-
-	/*
-	 * If an I/O scheduler has been configured and we got a driver tag for
-	 * the next request already, free it.
-	 */
-	if (next)
-		blk_mq_put_driver_tag(next);
-
 	list_add(&rq->queuelist, list);
 	__blk_mq_requeue_request(rq);
 }
@@ -2009,7 +1999,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 {
 	enum prep_dispatch prep;
 	struct request_queue *q = hctx->queue;
-	struct request *rq, *nxt;
+	struct request *rq;
 	int queued;
 	blk_status_t ret = BLK_STS_OK;
 	LIST_HEAD(zone_list);
@@ -2035,17 +2025,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 		list_del_init(&rq->queuelist);
 
 		bd.rq = rq;
-
-		/*
-		 * Flag last if we have no more requests, or if we have more
-		 * but can't assign a driver tag to it.
-		 */
-		if (list_empty(list))
-			bd.last = true;
-		else {
-			nxt = list_first_entry(list, struct request, queuelist);
-			bd.last = !blk_mq_get_driver_tag(nxt);
-		}
+		bd.last = list_empty(list);
 
 		/*
 		 * once the request is queued to lld, no need to cover the
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (9 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 10/13] blk-mq: remove set of bd->last when get driver tag for next request fails Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-08 18:06   ` Christoph Hellwig
  2023-01-04 14:22 ` [PATCH v2 12/13] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 13/13] blk-mq: correct stale comment of .get_budget Kemeng Shi
  12 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Function blk_mq_plug_issue_direct tries to issue batch requests in plug
list to driver directly. We will only issue plug request to driver if we
are not from scheduler, so from_scheduler parameter of
blk_mq_plug_issue_direct is always false, so as the blk_mq_commit_rqs
which is only called in blk_mq_plug_issue_direct.
Remove unncessary from_scheduler of blk_mq_plug_issue_direct and
blk_mq_commit_rqs.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 882c03a3f0aa..696bd4a82b14 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2528,11 +2528,10 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 	spin_unlock(&ctx->lock);
 }
 
-static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
-			      bool from_schedule)
+static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued)
 {
 	if (hctx->queue->mq_ops->commit_rqs) {
-		trace_block_unplug(hctx->queue, *queued, !from_schedule);
+		trace_block_unplug(hctx->queue, *queued, true);
 		hctx->queue->mq_ops->commit_rqs(hctx);
 	}
 	*queued = 0;
@@ -2661,7 +2660,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 	return __blk_mq_try_issue_directly(rq->mq_hctx, rq, true, last);
 }
 
-static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_plug_issue_direct(struct blk_plug *plug)
 {
 	struct blk_mq_hw_ctx *hctx = NULL;
 	struct request *rq;
@@ -2673,7 +2672,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 
 		if (hctx != rq->mq_hctx) {
 			if (hctx)
-				blk_mq_commit_rqs(hctx, &queued, from_schedule);
+				blk_mq_commit_rqs(hctx, &queued);
 			hctx = rq->mq_hctx;
 		}
 
@@ -2685,7 +2684,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 		case BLK_STS_RESOURCE:
 		case BLK_STS_DEV_RESOURCE:
 			blk_mq_request_bypass_insert(rq, false, true);
-			blk_mq_commit_rqs(hctx, &queued, from_schedule);
+			blk_mq_commit_rqs(hctx, &queued);
 			return;
 		default:
 			blk_mq_end_request(rq, ret);
@@ -2698,7 +2697,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 	 * there was more coming, but that turned out to be a lie.
 	 */
 	if (ret != BLK_STS_OK)
-		blk_mq_commit_rqs(hctx, &queued, from_schedule);
+		blk_mq_commit_rqs(hctx, &queued);
 }
 
 static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2769,7 +2768,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 		}
 
 		blk_mq_run_dispatch_ops(q,
-				blk_mq_plug_issue_direct(plug, false));
+				blk_mq_plug_issue_direct(plug));
 		if (rq_list_empty(plug->mq_list))
 			return;
 	}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/13] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (10 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  2023-01-04 14:22 ` [PATCH v2 13/13] blk-mq: correct stale comment of .get_budget Kemeng Shi
  12 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Use switch/case handle error as other function do to improve
readability in blk_mq_try_issue_list_directly.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 696bd4a82b14..64fa78d25d8e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2790,16 +2790,21 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 
 		list_del_init(&rq->queuelist);
 		ret = blk_mq_request_issue_directly(rq, list_empty(list));
-		if (ret != BLK_STS_OK) {
-			if (ret == BLK_STS_RESOURCE ||
-					ret == BLK_STS_DEV_RESOURCE) {
-				blk_mq_request_bypass_insert(rq, false,
-							list_empty(list));
-				break;
-			}
-			blk_mq_end_request(rq, ret);
-		} else
+		switch (ret) {
+		case BLK_STS_OK:
 			queued++;
+			break;
+		case BLK_STS_RESOURCE:
+		case BLK_STS_DEV_RESOURCE:
+			blk_mq_request_bypass_insert(rq, false,
+						     list_empty(list));
+			if (hctx->queue->mq_ops->commit_rqs && queued)
+				hctx->queue->mq_ops->commit_rqs(hctx);
+			return;
+		default:
+			blk_mq_end_request(rq, ret);
+			break;
+		}
 	}
 
 	/*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 13/13] blk-mq: correct stale comment of .get_budget
  2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
                   ` (11 preceding siblings ...)
  2023-01-04 14:22 ` [PATCH v2 12/13] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly Kemeng Shi
@ 2023-01-04 14:22 ` Kemeng Shi
  12 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-04 14:22 UTC (permalink / raw)
  To: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel
  Cc: hch, john.garry, jack

Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget")
remove BLK_STS_RESOURCE return value and we only check if we can get
the budget from .get_budget() now.
Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE"
to ".get_budget() fails to get the budget".

Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 block/blk-mq-sched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ae40cdb7a383..06b312c69114 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,7 +81,7 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
 /*
  * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
  * its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
  *
  * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
  * be run again.  This is necessary to avoid starving flushes.
@@ -209,7 +209,7 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
 /*
  * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
  * its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
  *
  * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
  * be run again.  This is necessary to avoid starving flushes.
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx
  2023-01-04 14:22 ` [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx Kemeng Shi
@ 2023-01-08 17:55   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 17:55 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
  2023-01-04 14:22 ` [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx Kemeng Shi
@ 2023-01-08 17:55   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 17:55 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
  2023-01-04 14:22 ` [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait Kemeng Shi
@ 2023-01-08 17:55   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 17:55 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly
  2023-01-04 14:22 ` [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly Kemeng Shi
@ 2023-01-08 17:56   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 17:56 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct
  2023-01-04 14:22 ` [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct Kemeng Shi
@ 2023-01-08 18:02   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 18:02 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

I'm really confused by this.  Why do we need the extra commit
anyway if the errored command has never made it to the device?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly
  2023-01-04 14:22 ` [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly Kemeng Shi
@ 2023-01-08 18:03   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 18:03 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Similar comment as for the previous patch, just complicated by the
excursion to blk_mq_request_bypass_insert.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  2023-01-04 14:22 ` [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list Kemeng Shi
@ 2023-01-08 18:06   ` Christoph Hellwig
  2023-01-09  2:27     ` Kemeng Shi
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 18:06 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

I think we need to come up with a clear rule on when commit_rqs
needs to be called, and follow that.  In this case I'd be confused
if there was any case where we need to call it if list was empty.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct
  2023-01-04 14:22 ` [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct Kemeng Shi
@ 2023-01-08 18:06   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-08 18:06 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel, hch,
	john.garry, jack

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  2023-01-08 18:06   ` Christoph Hellwig
@ 2023-01-09  2:27     ` Kemeng Shi
  2023-01-10  8:09       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Kemeng Shi @ 2023-01-09  2:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel,
	john.garry, jack


Hi, Christoph, thank you so much for review.
on 1/9/2023 2:06 AM, Christoph Hellwig wrote:
> I think we need to come up with a clear rule on when commit_rqs
> needs to be called, and follow that.  In this case I'd be confused
> if there was any case where we need to call it if list was empty.
> 
After we queue request[s] to one driver queue, we need to notify driver
that there are no more request to the queue or driver will keep waiting
for the last request to be queued and IO hung could happen.
Normaly, we will notify this by setting .last in struct blk_mq_queue_data
along with the normal last request .rq in struct blk_mq_queue_data. The
extra commit is only needed if normal last information in .last is lost.
(See comment in struct blk_mq_ops for commit_rqs).

The lost could occur if error happens for sending last request with .last
set or error happen in middle of list and we even do not send the request
with .last set.

-- 
Best wishes
Kemeng Shi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  2023-01-09  2:27     ` Kemeng Shi
@ 2023-01-10  8:09       ` Christoph Hellwig
  2023-01-10 12:35         ` Kemeng Shi
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2023-01-10  8:09 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: Christoph Hellwig, axboe, dwagner, hare, ming.lei, linux-block,
	linux-kernel, john.garry, jack

On Mon, Jan 09, 2023 at 10:27:33AM +0800, Kemeng Shi wrote:
> After we queue request[s] to one driver queue, we need to notify driver
> that there are no more request to the queue or driver will keep waiting
> for the last request to be queued and IO hung could happen.

Yes.

> Normaly, we will notify this by setting .last in struct blk_mq_queue_data
> along with the normal last request .rq in struct blk_mq_queue_data. The
> extra commit is only needed if normal last information in .last is lost.
> (See comment in struct blk_mq_ops for commit_rqs).
> 
> The lost could occur if error happens for sending last request with .last
> set or error happen in middle of list and we even do not send the request
> with .last set.

Yes. So the rule is:

 1) did not queue everything initially scheduled to queue

OR

 2) the last attempt to queue a request failed

I think we need to find a way to clearly document that and that
make all callers match it.

For most this becomes a

	if (ret || !list_empty(list))

or even just

	if (ret)

as an error is often the only way to break out of the submission
loop.

I wonder if we need to split the queued clearing from blk_mq_commit_rqs
and just clear it in the existing callers, so that we can use that
helpers for all commits, nicely hiding the ->commit_rqs presence
check, and then move that call to where it is needed directly.  Something
like this untested patch (which needs to be split up), which also
makes sure we trace these calls consistently:

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c5cf0dbca1db8d..436ca56a0b7172 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2001,6 +2001,15 @@ static void blk_mq_release_budgets(struct request_queue *q,
 	}
 }
 
+static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
+			      bool from_schedule)
+{
+	if (queued && hctx->queue->mq_ops->commit_rqs) {
+		trace_block_unplug(hctx->queue, queued, !from_schedule);
+		hctx->queue->mq_ops->commit_rqs(hctx);
+	}
+}
+
 /*
  * Returns true if we did some work AND can potentially do more.
  */
@@ -2082,12 +2091,9 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 	if (!list_empty(&zone_list))
 		list_splice_tail_init(&zone_list, list);
 
-	/* If we didn't flush the entire list, we could have told the driver
-	 * there was more coming, but that turned out to be a lie.
-	 */
-	if ((!list_empty(list) || errors || needs_resource ||
-	     ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
-		q->mq_ops->commit_rqs(hctx);
+	if (!list_empty(list) || ret)
+		blk_mq_commit_rqs(hctx, queued, false);
+
 	/*
 	 * Any items that need requeuing? Stuff them into hctx->dispatch,
 	 * that is where we will continue on next queue run.
@@ -2548,16 +2554,6 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 	spin_unlock(&ctx->lock);
 }
 
-static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
-			      bool from_schedule)
-{
-	if (hctx->queue->mq_ops->commit_rqs) {
-		trace_block_unplug(hctx->queue, *queued, !from_schedule);
-		hctx->queue->mq_ops->commit_rqs(hctx);
-	}
-	*queued = 0;
-}
-
 static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		unsigned int nr_segs)
 {
@@ -2684,17 +2680,17 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 {
 	struct blk_mq_hw_ctx *hctx = NULL;
+	blk_status_t ret = BLK_STS_OK;
 	struct request *rq;
 	int queued = 0;
-	int errors = 0;
 
 	while ((rq = rq_list_pop(&plug->mq_list))) {
 		bool last = rq_list_empty(plug->mq_list);
-		blk_status_t ret;
 
 		if (hctx != rq->mq_hctx) {
 			if (hctx)
-				blk_mq_commit_rqs(hctx, &queued, from_schedule);
+				blk_mq_commit_rqs(hctx, queued, from_schedule);
+			queued = 0;
 			hctx = rq->mq_hctx;
 		}
 
@@ -2706,21 +2702,15 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 		case BLK_STS_RESOURCE:
 		case BLK_STS_DEV_RESOURCE:
 			blk_mq_request_bypass_insert(rq, false, true);
-			blk_mq_commit_rqs(hctx, &queued, from_schedule);
-			return;
+			goto out;
 		default:
 			blk_mq_end_request(rq, ret);
-			errors++;
 			break;
 		}
 	}
-
-	/*
-	 * If we didn't flush the entire list, we could have told the driver
-	 * there was more coming, but that turned out to be a lie.
-	 */
-	if (errors)
-		blk_mq_commit_rqs(hctx, &queued, from_schedule);
+out:
+	if (ret)
+		blk_mq_commit_rqs(hctx, queued, from_schedule);
 }
 
 static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2804,37 +2794,33 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct list_head *list)
 {
+	blk_status_t ret = BLK_STS_OK;
+	struct request *rq;
 	int queued = 0;
-	int errors = 0;
-
-	while (!list_empty(list)) {
-		blk_status_t ret;
-		struct request *rq = list_first_entry(list, struct request,
-				queuelist);
+	bool last;
 
+	while ((rq = list_first_entry_or_null(list, struct request,
+			queuelist))) {
 		list_del_init(&rq->queuelist);
-		ret = blk_mq_request_issue_directly(rq, list_empty(list));
-		if (ret != BLK_STS_OK) {
-			errors++;
-			if (ret == BLK_STS_RESOURCE ||
-					ret == BLK_STS_DEV_RESOURCE) {
-				blk_mq_request_bypass_insert(rq, false,
-							list_empty(list));
-				break;
-			}
-			blk_mq_end_request(rq, ret);
-		} else
+		last = list_empty(list);
+
+		ret = blk_mq_request_issue_directly(rq, last);
+		switch (ret) {
+		case BLK_STS_OK:
 			queued++;
+			break;
+		case BLK_STS_RESOURCE:
+		case BLK_STS_DEV_RESOURCE:
+			blk_mq_request_bypass_insert(rq, false, last);
+			goto out;
+		default:
+			blk_mq_end_request(rq, ret);
+			break;
+		}
 	}
-
-	/*
-	 * If we didn't flush the entire list, we could have told
-	 * the driver there was more coming, but that turned out to
-	 * be a lie.
-	 */
-	if ((!list_empty(list) || errors) &&
-	     hctx->queue->mq_ops->commit_rqs && queued)
-		hctx->queue->mq_ops->commit_rqs(hctx);
+out:
+	if (ret)
+		blk_mq_commit_rqs(hctx, queued, false);
 }
 
 static bool blk_mq_attempt_bio_merge(struct request_queue *q,

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
  2023-01-10  8:09       ` Christoph Hellwig
@ 2023-01-10 12:35         ` Kemeng Shi
  0 siblings, 0 replies; 25+ messages in thread
From: Kemeng Shi @ 2023-01-10 12:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, dwagner, hare, ming.lei, linux-block, linux-kernel,
	john.garry, jack



on 1/10/2023 4:09 PM, Christoph Hellwig wrote:
> On Mon, Jan 09, 2023 at 10:27:33AM +0800, Kemeng Shi wrote:
>> After we queue request[s] to one driver queue, we need to notify driver
>> that there are no more request to the queue or driver will keep waiting
>> for the last request to be queued and IO hung could happen.
> 
> Yes.
> 
>> Normaly, we will notify this by setting .last in struct blk_mq_queue_data
>> along with the normal last request .rq in struct blk_mq_queue_data. The
>> extra commit is only needed if normal last information in .last is lost.
>> (See comment in struct blk_mq_ops for commit_rqs).
>>
>> The lost could occur if error happens for sending last request with .last
>> set or error happen in middle of list and we even do not send the request
>> with .last set.
> 
> Yes. So the rule is:
> 
>  1) did not queue everything initially scheduled to queue
> 
> OR
> 
>  2) the last attempt to queue a request failed
> 
> I think we need to find a way to clearly document that and that
> make all callers match it.
> For most this becomes a
> 
> 	if (ret || !list_empty(list))
> 
> or even just
> 
> 	if (ret)
> 
> as an error is often the only way to break out of the submission
> loop.
> 
> I wonder if we need to split the queued clearing from blk_mq_commit_rqs
> and just clear it in the existing callers, so that we can use that
> helpers for all commits, nicely hiding the ->commit_rqs presence
> check, and then move that call to where it is needed directly.  Something
> like this untested patch (which needs to be split up), which also
> makes sure we trace these calls consistently:
Yes, using helper also makes queued check consistently. Currently, most code
only calls commit_rqs if any request is queued, one exception is that
blk_mq_plug_issue_direct calls commit_rqs without queued check.
Besides, we can document the the rule before blk_mq_commit_rqs. Any caller
in future can notice the rule and match it.
I will send next version based on suggested helper.
Thanks.
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c5cf0dbca1db8d..436ca56a0b7172 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2001,6 +2001,15 @@ static void blk_mq_release_budgets(struct request_queue *q,
>  	}
>  }
>  
> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
> +			      bool from_schedule)
> +{
> +	if (queued && hctx->queue->mq_ops->commit_rqs) {
> +		trace_block_unplug(hctx->queue, queued, !from_schedule);
> +		hctx->queue->mq_ops->commit_rqs(hctx);
> +	}
> +}
> +
>  /*
>   * Returns true if we did some work AND can potentially do more.
>   */
> @@ -2082,12 +2091,9 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>  	if (!list_empty(&zone_list))
>  		list_splice_tail_init(&zone_list, list);
>  
> -	/* If we didn't flush the entire list, we could have told the driver
> -	 * there was more coming, but that turned out to be a lie.
> -	 */
> -	if ((!list_empty(list) || errors || needs_resource ||
> -	     ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
> -		q->mq_ops->commit_rqs(hctx);
> +	if (!list_empty(list) || ret)
> +		blk_mq_commit_rqs(hctx, queued, false);
> +
>  	/*
>  	 * Any items that need requeuing? Stuff them into hctx->dispatch,
>  	 * that is where we will continue on next queue run.
> @@ -2548,16 +2554,6 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
>  	spin_unlock(&ctx->lock);
>  }
>  
> -static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
> -			      bool from_schedule)
> -{
> -	if (hctx->queue->mq_ops->commit_rqs) {
> -		trace_block_unplug(hctx->queue, *queued, !from_schedule);
> -		hctx->queue->mq_ops->commit_rqs(hctx);
> -	}
> -	*queued = 0;
> -}
> -
>  static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
>  		unsigned int nr_segs)
>  {
> @@ -2684,17 +2680,17 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
>  static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
>  {
>  	struct blk_mq_hw_ctx *hctx = NULL;
> +	blk_status_t ret = BLK_STS_OK;
>  	struct request *rq;
>  	int queued = 0;
> -	int errors = 0;
>  
>  	while ((rq = rq_list_pop(&plug->mq_list))) {
>  		bool last = rq_list_empty(plug->mq_list);
> -		blk_status_t ret;
>  
>  		if (hctx != rq->mq_hctx) {
>  			if (hctx)
> -				blk_mq_commit_rqs(hctx, &queued, from_schedule);
> +				blk_mq_commit_rqs(hctx, queued, from_schedule);
> +			queued = 0;
>  			hctx = rq->mq_hctx;
>  		}
>  
> @@ -2706,21 +2702,15 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
>  		case BLK_STS_RESOURCE:
>  		case BLK_STS_DEV_RESOURCE:
>  			blk_mq_request_bypass_insert(rq, false, true);
> -			blk_mq_commit_rqs(hctx, &queued, from_schedule);
> -			return;
> +			goto out;
>  		default:
>  			blk_mq_end_request(rq, ret);
> -			errors++;
>  			break;
>  		}
>  	}
> -
> -	/*
> -	 * If we didn't flush the entire list, we could have told the driver
> -	 * there was more coming, but that turned out to be a lie.
> -	 */
> -	if (errors)
> -		blk_mq_commit_rqs(hctx, &queued, from_schedule);
> +out:
> +	if (ret)
> +		blk_mq_commit_rqs(hctx, queued, from_schedule);
>  }
>  
>  static void __blk_mq_flush_plug_list(struct request_queue *q,
> @@ -2804,37 +2794,33 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
>  void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  		struct list_head *list)
>  {
> +	blk_status_t ret = BLK_STS_OK;
> +	struct request *rq;
>  	int queued = 0;
> -	int errors = 0;
> -
> -	while (!list_empty(list)) {
> -		blk_status_t ret;
> -		struct request *rq = list_first_entry(list, struct request,
> -				queuelist);
> +	bool last;
>  
> +	while ((rq = list_first_entry_or_null(list, struct request,
> +			queuelist))) {
>  		list_del_init(&rq->queuelist);
> -		ret = blk_mq_request_issue_directly(rq, list_empty(list));
> -		if (ret != BLK_STS_OK) {
> -			errors++;
> -			if (ret == BLK_STS_RESOURCE ||
> -					ret == BLK_STS_DEV_RESOURCE) {
> -				blk_mq_request_bypass_insert(rq, false,
> -							list_empty(list));
> -				break;
> -			}
> -			blk_mq_end_request(rq, ret);
> -		} else
> +		last = list_empty(list);
> +
> +		ret = blk_mq_request_issue_directly(rq, last);
> +		switch (ret) {
> +		case BLK_STS_OK:
>  			queued++;
> +			break;
> +		case BLK_STS_RESOURCE:
> +		case BLK_STS_DEV_RESOURCE:
> +			blk_mq_request_bypass_insert(rq, false, last);
> +			goto out;
> +		default:
> +			blk_mq_end_request(rq, ret);
> +			break;
> +		}
>  	}
> -
> -	/*
> -	 * If we didn't flush the entire list, we could have told
> -	 * the driver there was more coming, but that turned out to
> -	 * be a lie.
> -	 */
> -	if ((!list_empty(list) || errors) &&
> -	     hctx->queue->mq_ops->commit_rqs && queued)
> -		hctx->queue->mq_ops->commit_rqs(hctx);
> +out:
> +	if (ret)
> +		blk_mq_commit_rqs(hctx, queued, false);
>  }
>  
>  static bool blk_mq_attempt_bio_merge(struct request_queue *q,
> 

-- 
Best wishes
Kemeng Shi


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-01-10 12:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-04 14:22 [PATCH v2 00/13] A few bugfix and cleanup patches for blk-mq Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 01/13] blk-mq: avoid sleep in blk_mq_alloc_request_hctx Kemeng Shi
2023-01-08 17:55   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 02/13] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx Kemeng Shi
2023-01-08 17:55   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 03/13] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait Kemeng Shi
2023-01-08 17:55   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 04/13] blk-mq: Fix potential io hung for shared sbitmap per tagset Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 05/13] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly Kemeng Shi
2023-01-08 17:56   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 06/13] blk-mq: remove unncessary error count and flush in blk_mq_plug_issue_direct Kemeng Shi
2023-01-08 18:02   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 07/13] blk-mq: remove error count and unncessary flush in blk_mq_try_issue_list_directly Kemeng Shi
2023-01-08 18:03   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 08/13] blk-mq: simplify flush check in blk_mq_dispatch_rq_list Kemeng Shi
2023-01-08 18:06   ` Christoph Hellwig
2023-01-09  2:27     ` Kemeng Shi
2023-01-10  8:09       ` Christoph Hellwig
2023-01-10 12:35         ` Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 09/13] blk-mq: remove unnecessary error count and " Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 10/13] blk-mq: remove set of bd->last when get driver tag for next request fails Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 11/13] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct Kemeng Shi
2023-01-08 18:06   ` Christoph Hellwig
2023-01-04 14:22 ` [PATCH v2 12/13] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly Kemeng Shi
2023-01-04 14:22 ` [PATCH v2 13/13] blk-mq: correct stale comment of .get_budget Kemeng Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).