public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests
@ 2025-09-10  8:04 Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Changes from v1:
 - add comments explaining accessing q->elevator without holding
   elevator_lock in patch 4;
 - add non-shared checking in patch 8;
 - add review tag by Nilay, patch 1-5,7,9,10;

Yu Kuai (10):
  blk-mq: remove useless checking in queue_requests_store()
  blk-mq: remove useless checkings in blk_mq_update_nr_requests()
  blk-mq: check invalid nr_requests in queue_requests_store()
  blk-mq: convert to serialize updating nr_requests with
    update_nr_hwq_lock
  blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  blk-mq: split bitmap grow and resize case in
    blk_mq_update_nr_requests()
  blk-mq-sched: add new parameter nr_requests in
    blk_mq_alloc_sched_tags()
  blk-mq: fix potential deadlock while nr_requests grown
  blk-mq: remove blk_mq_tag_update_depth()
  blk-mq: fix stale nr_requests documentation

 Documentation/ABI/stable/sysfs-block | 14 ++----
 block/blk-mq-sched.c                 | 14 ++----
 block/blk-mq-sched.h                 |  2 +-
 block/blk-mq-tag.c                   | 53 ---------------------
 block/blk-mq.c                       | 71 ++++++++++++++--------------
 block/blk-mq.h                       | 18 +++++--
 block/blk-sysfs.c                    | 60 ++++++++++++++++++-----
 block/elevator.c                     |  3 +-
 8 files changed, 112 insertions(+), 123 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

blk_mq_queue_attr_visible() already checked queue_is_mq(), no need to
check this again in queue_requests_store().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-sysfs.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c94b8b6ab024..1ffa65feca4f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -69,9 +69,6 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 	unsigned int memflags;
 	struct request_queue *q = disk->queue;
 
-	if (!queue_is_mq(q))
-		return -EINVAL;
-
 	ret = queue_var_store(&nr, page, count);
 	if (ret < 0)
 		return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

1) queue_requests_store() is the only caller of
blk_mq_update_nr_requests(), where queue is already freezed, no need to
check mq_freeze_depth;
2) q->tag_set must be set for request based device, and queue_is_mq() is
already checked in blk_mq_queue_attr_visible(), no need to check
q->tag_set.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 31cc743ffad7..55ccc9f4435d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4930,21 +4930,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
-	int ret;
+	int ret = 0;
 	unsigned long i;
 
-	if (WARN_ON_ONCE(!q->mq_freeze_depth))
-		return -EINVAL;
-
-	if (!set)
-		return -EINVAL;
-
 	if (q->nr_requests == nr)
 		return 0;
 
 	blk_mq_quiesce_queue(q);
 
-	ret = 0;
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (!hctx->tags)
 			continue;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

queue_requests_store() is the only caller of
blk_mq_update_nr_requests(), and blk_mq_update_nr_requests() is the
only caller of blk_mq_tag_update_depth(), however, they all have
checkings for nr_requests input by user.

Make code cleaner by moving all the checkings to the top function:

1) nr_requests > reserved tags;
2) if there is elevator, 4 <= nr_requests <= 2048;
3) if elevator is none, 4 <= nr_requests <= tag_set->queue_depth;

Meanwhile, case 2 is the only case tags can grow and -ENOMEM might be
returned.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-tag.c | 16 +---------------
 block/blk-mq.c     |  8 ++------
 block/blk-mq.h     |  2 +-
 block/blk-sysfs.c  | 13 +++++++++++++
 4 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 086c67849e06..56f9bc839000 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -610,14 +610,10 @@ void blk_mq_free_tags(struct blk_mq_tag_set *set, struct blk_mq_tags *tags)
 }
 
 int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
-			    struct blk_mq_tags **tagsptr, unsigned int tdepth,
-			    bool can_grow)
+			    struct blk_mq_tags **tagsptr, unsigned int tdepth)
 {
 	struct blk_mq_tags *tags = *tagsptr;
 
-	if (tdepth <= tags->nr_reserved_tags)
-		return -EINVAL;
-
 	/*
 	 * If we are allowed to grow beyond the original size, allocate
 	 * a new set of tags before freeing the old one.
@@ -626,16 +622,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
 		struct blk_mq_tags *new;
 
-		if (!can_grow)
-			return -EINVAL;
-
-		/*
-		 * We need some sort of upper limit, set it high enough that
-		 * no valid use cases should require more.
-		 */
-		if (tdepth > MAX_SCHED_RQ)
-			return -EINVAL;
-
 		/*
 		 * Only the sbitmap needs resizing since we allocated the max
 		 * initially.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 55ccc9f4435d..9b97f2f3f2c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4933,9 +4933,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	int ret = 0;
 	unsigned long i;
 
-	if (q->nr_requests == nr)
-		return 0;
-
 	blk_mq_quiesce_queue(q);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
@@ -4947,10 +4944,9 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		 */
 		if (hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
-						      nr, true);
+						      nr);
 		} else {
-			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
-						      false);
+			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
 		}
 		if (ret)
 			goto out;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b96a753809ab..5d42c7d3a952 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -171,7 +171,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		unsigned int tag);
 void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
 int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
-		struct blk_mq_tags **tags, unsigned int depth, bool can_grow);
+		struct blk_mq_tags **tags, unsigned int depth);
 void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
 		unsigned int size);
 void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1ffa65feca4f..f99519f7a820 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -75,12 +75,25 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 
 	memflags = blk_mq_freeze_queue(q);
 	mutex_lock(&q->elevator_lock);
+
+	if (nr == q->nr_requests)
+		goto unlock;
+
 	if (nr < BLKDEV_MIN_RQ)
 		nr = BLKDEV_MIN_RQ;
 
+	if (nr <= q->tag_set->reserved_tags ||
+	    (q->elevator && nr > MAX_SCHED_RQ) ||
+	    (!q->elevator && nr > q->tag_set->queue_depth)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
 	err = blk_mq_update_nr_requests(disk->queue, nr);
 	if (err)
 		ret = err;
+
+unlock:
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (2 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10 10:05   ` Nilay Shroff
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

request_queue->nr_requests can be changed by:

a) switch elevator by updating nr_hw_queues
b) switch elevator by elevator sysfs attribute
c) configue queue sysfs attribute nr_requests

Current lock order is:

1) update_nr_hwq_lock, case a,b
2) freeze_queue
3) elevator_lock, case a,b,c

And update nr_requests is seriablized by elevator_lock() already,
however, in the case c, we'll have to allocate new sched_tags if
nr_requests grow, and do this with elevator_lock held and queue
freezed has the risk of deadlock.

Hence use update_nr_hwq_lock instead, make it possible to allocate
memory if tags grow, meanwhile also prevent nr_requests to be changed
concurrently.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-sysfs.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f99519f7a820..ff66d1b47169 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -68,13 +68,17 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 	int ret, err;
 	unsigned int memflags;
 	struct request_queue *q = disk->queue;
+	struct blk_mq_tag_set *set = q->tag_set;
 
 	ret = queue_var_store(&nr, page, count);
 	if (ret < 0)
 		return ret;
 
-	memflags = blk_mq_freeze_queue(q);
-	mutex_lock(&q->elevator_lock);
+	/*
+	 * Serialize updating nr_requests with concurrent queue_requests_store()
+	 * and switching elevator.
+	 */
+	down_write(&set->update_nr_hwq_lock);
 
 	if (nr == q->nr_requests)
 		goto unlock;
@@ -82,20 +86,31 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 	if (nr < BLKDEV_MIN_RQ)
 		nr = BLKDEV_MIN_RQ;
 
-	if (nr <= q->tag_set->reserved_tags ||
+	/*
+	 * Switching elevator is protected by update_nr_hwq_lock:
+	 *  - read lock is held from elevator sysfs attribute;
+	 *  - write lock is held from updating nr_hw_queues;
+	 * Hence it's safe to access q->elevator here with write lock held.
+	 */
+	if (nr <= set->reserved_tags ||
 	    (q->elevator && nr > MAX_SCHED_RQ) ||
-	    (!q->elevator && nr > q->tag_set->queue_depth)) {
+	    (!q->elevator && nr > set->queue_depth)) {
 		ret = -EINVAL;
 		goto unlock;
 	}
 
+	memflags = blk_mq_freeze_queue(q);
+	mutex_lock(&q->elevator_lock);
+
 	err = blk_mq_update_nr_requests(disk->queue, nr);
 	if (err)
 		ret = err;
 
-unlock:
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
+
+unlock:
+	up_write(&set->update_nr_hwq_lock);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (3 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-10-14 13:05   ` Chris Mason
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
make sense to call into blk_mq_tag_update_depth() multiple times for the
same tags.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-tag.c |  7 -------
 block/blk-mq.c     | 43 ++++++++++++++++++++++---------------------
 2 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 56f9bc839000..936b68273cad 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -622,13 +622,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
 		struct blk_mq_tags *new;
 
-		/*
-		 * Only the sbitmap needs resizing since we allocated the max
-		 * initially.
-		 */
-		if (blk_mq_is_shared_tags(set->flags))
-			return 0;
-
 		new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
 		if (!new)
 			return -ENOMEM;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9b97f2f3f2c9..80c20700bce8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 
 	blk_mq_quiesce_queue(q);
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		if (!hctx->tags)
-			continue;
-		/*
-		 * If we're using an MQ scheduler, just update the scheduler
-		 * queue depth. This is similar to what the old code would do.
-		 */
-		if (hctx->sched_tags) {
-			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
-						      nr);
-		} else {
-			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
-		}
-		if (ret)
-			goto out;
-	}
-
-	q->nr_requests = nr;
-	if (q->elevator && q->elevator->type->ops.depth_updated)
-		q->elevator->type->ops.depth_updated(q);
-
 	if (blk_mq_is_shared_tags(set->flags)) {
 		if (q->elevator)
 			blk_mq_tag_update_sched_shared_tags(q);
 		else
 			blk_mq_tag_resize_shared_tags(set, nr);
+	} else {
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (!hctx->tags)
+				continue;
+			/*
+			 * If we're using an MQ scheduler, just update the
+			 * scheduler queue depth. This is similar to what the
+			 * old code would do.
+			 */
+			if (hctx->sched_tags)
+				ret = blk_mq_tag_update_depth(hctx,
+							&hctx->sched_tags, nr);
+			else
+				ret = blk_mq_tag_update_depth(hctx,
+							&hctx->tags, nr);
+			if (ret)
+				goto out;
+		}
 	}
 
+	q->nr_requests = nr;
+	if (q->elevator && q->elevator->type->ops.depth_updated)
+		q->elevator->type->ops.depth_updated(q);
+
 out:
 	blk_mq_unquiesce_queue(q);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (4 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10 10:08   ` Nilay Shroff
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

No functional changes are intended, make code cleaner and prepare to fix
the grow case in following patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 80c20700bce8..299aac458185 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4936,25 +4936,40 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	blk_mq_quiesce_queue(q);
 
 	if (blk_mq_is_shared_tags(set->flags)) {
+		/*
+		 * Shared tags, for sched tags, we allocate max initially hence
+		 * tags can't grow, see blk_mq_alloc_sched_tags().
+		 */
 		if (q->elevator)
 			blk_mq_tag_update_sched_shared_tags(q);
 		else
 			blk_mq_tag_resize_shared_tags(set, nr);
-	} else {
+	} else if (!q->elevator) {
+		/*
+		 * Non-shared hardware tags, nr is already checked from
+		 * queue_requests_store() and tags can't grow.
+		 */
 		queue_for_each_hw_ctx(q, hctx, i) {
 			if (!hctx->tags)
 				continue;
-			/*
-			 * If we're using an MQ scheduler, just update the
-			 * scheduler queue depth. This is similar to what the
-			 * old code would do.
-			 */
-			if (hctx->sched_tags)
-				ret = blk_mq_tag_update_depth(hctx,
-							&hctx->sched_tags, nr);
-			else
-				ret = blk_mq_tag_update_depth(hctx,
-							&hctx->tags, nr);
+			sbitmap_queue_resize(&hctx->tags->bitmap_tags,
+				nr - hctx->tags->nr_reserved_tags);
+		}
+	} else if (nr <= q->elevator->et->nr_requests) {
+		/* Non-shared sched tags, and tags don't grow. */
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (!hctx->sched_tags)
+				continue;
+			sbitmap_queue_resize(&hctx->sched_tags->bitmap_tags,
+				nr - hctx->sched_tags->nr_reserved_tags);
+		}
+	} else {
+		/* Non-shared sched tags, and tags grow */
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (!hctx->sched_tags)
+				continue;
+			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
+						      nr);
 			if (ret)
 				goto out;
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (5 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

This helper only support to allocate the default number of requests,
add a new parameter to support specific number of requests.

Prepare to fix potential deadlock in the case nr_requests grow.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-sched.c | 14 +++++---------
 block/blk-mq-sched.h |  2 +-
 block/blk-mq.h       | 11 +++++++++++
 block/elevator.c     |  3 ++-
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e2ce4a28e6c9..d06bb137a743 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -454,7 +454,7 @@ void blk_mq_free_sched_tags_batch(struct xarray *et_table,
 }
 
 struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
-		unsigned int nr_hw_queues)
+		unsigned int nr_hw_queues, unsigned int nr_requests)
 {
 	unsigned int nr_tags;
 	int i;
@@ -470,13 +470,8 @@ struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
 			nr_tags * sizeof(struct blk_mq_tags *), gfp);
 	if (!et)
 		return NULL;
-	/*
-	 * Default to double of smaller one between hw queue_depth and
-	 * 128, since we don't split into sync/async like the old code
-	 * did. Additionally, this is a per-hw queue depth.
-	 */
-	et->nr_requests = 2 * min_t(unsigned int, set->queue_depth,
-			BLKDEV_DEFAULT_RQ);
+
+	et->nr_requests = nr_requests;
 	et->nr_hw_queues = nr_hw_queues;
 
 	if (blk_mq_is_shared_tags(set->flags)) {
@@ -521,7 +516,8 @@ int blk_mq_alloc_sched_tags_batch(struct xarray *et_table,
 		 * concurrently.
 		 */
 		if (q->elevator) {
-			et = blk_mq_alloc_sched_tags(set, nr_hw_queues);
+			et = blk_mq_alloc_sched_tags(set, nr_hw_queues,
+					blk_mq_default_nr_requests(set));
 			if (!et)
 				goto out_unwind;
 			if (xa_insert(et_table, q->id, et, gfp))
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index fe83187f41db..8e21a6b1415d 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -24,7 +24,7 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
 void blk_mq_sched_free_rqs(struct request_queue *q);
 
 struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
-		unsigned int nr_hw_queues);
+		unsigned int nr_hw_queues, unsigned int nr_requests);
 int blk_mq_alloc_sched_tags_batch(struct xarray *et_table,
 		struct blk_mq_tag_set *set, unsigned int nr_hw_queues);
 void blk_mq_free_sched_tags(struct elevator_tags *et,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 5d42c7d3a952..3a1d4c37d1bc 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -109,6 +109,17 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(blk_opf_t opf,
 	return ctx->hctxs[blk_mq_get_hctx_type(opf)];
 }
 
+/*
+ * Default to double of smaller one between hw queue_depth and
+ * 128, since we don't split into sync/async like the old code
+ * did. Additionally, this is a per-hw queue depth.
+ */
+static inline unsigned int blk_mq_default_nr_requests(
+		struct blk_mq_tag_set *set)
+{
+	return 2 * min_t(unsigned int, set->queue_depth, BLKDEV_DEFAULT_RQ);
+}
+
 /*
  * sysfs helpers
  */
diff --git a/block/elevator.c b/block/elevator.c
index fe96c6f4753c..e2ebfbf107b3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -669,7 +669,8 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
 	lockdep_assert_held(&set->update_nr_hwq_lock);
 
 	if (strncmp(ctx->name, "none", 4)) {
-		ctx->et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues);
+		ctx->et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues,
+				blk_mq_default_nr_requests(set));
 		if (!ctx->et)
 			return -ENOMEM;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (6 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10 10:11   ` Nilay Shroff
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Allocate and free sched_tags while queue is freezed can deadlock[1],
this is a long term problem, hence allocate memory before freezing
queue and free memory after queue is unfreezed.

[1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-mq.c    | 22 +++++++++-------------
 block/blk-mq.h    |  5 ++++-
 block/blk-sysfs.c | 29 +++++++++++++++++++++--------
 3 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 299aac458185..4ccf11cadf8c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4926,11 +4926,13 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+struct elevator_tags *blk_mq_update_nr_requests(struct request_queue *q,
+						struct elevator_tags *et,
+						unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
+	struct elevator_tags *old_et = NULL;
 	struct blk_mq_hw_ctx *hctx;
-	int ret = 0;
 	unsigned long i;
 
 	blk_mq_quiesce_queue(q);
@@ -4965,24 +4967,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		}
 	} else {
 		/* Non-shared sched tags, and tags grow */
-		queue_for_each_hw_ctx(q, hctx, i) {
-			if (!hctx->sched_tags)
-				continue;
-			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
-						      nr);
-			if (ret)
-				goto out;
-		}
+		queue_for_each_hw_ctx(q, hctx, i)
+			hctx->sched_tags = et->tags[i];
+		old_et =  q->elevator->et;
+		q->elevator->et = et;
 	}
 
 	q->nr_requests = nr;
 	if (q->elevator && q->elevator->type->ops.depth_updated)
 		q->elevator->type->ops.depth_updated(q);
 
-out:
 	blk_mq_unquiesce_queue(q);
-
-	return ret;
+	return old_et;
 }
 
 /*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3a1d4c37d1bc..0ef96f510477 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -6,6 +6,7 @@
 #include "blk-stat.h"
 
 struct blk_mq_tag_set;
+struct elevator_tags;
 
 struct blk_mq_ctxs {
 	struct kobject kobj;
@@ -45,7 +46,9 @@ void blk_mq_submit_bio(struct bio *bio);
 int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob,
 		unsigned int flags);
 void blk_mq_exit_queue(struct request_queue *q);
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
+struct elevator_tags *blk_mq_update_nr_requests(struct request_queue *q,
+						struct elevator_tags *tags,
+						unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
 			     bool);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff66d1b47169..76c47fe9b8d6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -64,11 +64,12 @@ static ssize_t queue_requests_show(struct gendisk *disk, char *page)
 static ssize_t
 queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 {
-	unsigned long nr;
-	int ret, err;
-	unsigned int memflags;
 	struct request_queue *q = disk->queue;
 	struct blk_mq_tag_set *set = q->tag_set;
+	struct elevator_tags *et = NULL;
+	unsigned int memflags;
+	unsigned long nr;
+	int ret;
 
 	ret = queue_var_store(&nr, page, count);
 	if (ret < 0)
@@ -99,16 +100,28 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
 		goto unlock;
 	}
 
+	if (!blk_mq_is_shared_tags(set->flags) && q->elevator &&
+	    nr > q->elevator->et->nr_requests) {
+		/*
+		 * Tags will grow, allocate memory before freezing queue to
+		 * prevent deadlock.
+		 */
+		et = blk_mq_alloc_sched_tags(set, q->nr_hw_queues, nr);
+		if (!et) {
+			ret = -ENOMEM;
+			goto unlock;
+		}
+	}
+
 	memflags = blk_mq_freeze_queue(q);
 	mutex_lock(&q->elevator_lock);
-
-	err = blk_mq_update_nr_requests(disk->queue, nr);
-	if (err)
-		ret = err;
-
+	et = blk_mq_update_nr_requests(q, et, nr);
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
 
+	if (et)
+		blk_mq_free_sched_tags(et, set);
+
 unlock:
 	up_write(&set->update_nr_hwq_lock);
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth()
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (7 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
  2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

This helper is not used now.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-tag.c | 32 --------------------------------
 block/blk-mq.h     |  2 --
 2 files changed, 34 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 936b68273cad..a63d21a4aab4 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -609,38 +609,6 @@ void blk_mq_free_tags(struct blk_mq_tag_set *set, struct blk_mq_tags *tags)
 	call_srcu(&set->tags_srcu, &tags->rcu_head, blk_mq_free_tags_callback);
 }
 
-int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
-			    struct blk_mq_tags **tagsptr, unsigned int tdepth)
-{
-	struct blk_mq_tags *tags = *tagsptr;
-
-	/*
-	 * If we are allowed to grow beyond the original size, allocate
-	 * a new set of tags before freeing the old one.
-	 */
-	if (tdepth > tags->nr_tags) {
-		struct blk_mq_tag_set *set = hctx->queue->tag_set;
-		struct blk_mq_tags *new;
-
-		new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
-		if (!new)
-			return -ENOMEM;
-
-		blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
-		hctx->queue->elevator->et->tags[hctx->queue_num] = new;
-		*tagsptr = new;
-	} else {
-		/*
-		 * Don't need (or can't) update reserved tags here, they
-		 * remain static and should never need resizing.
-		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
-				tdepth - tags->nr_reserved_tags);
-	}
-
-	return 0;
-}
-
 void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set, unsigned int size)
 {
 	struct blk_mq_tags *tags = set->shared_tags;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0ef96f510477..af42dc018808 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -184,8 +184,6 @@ unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
 void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		unsigned int tag);
 void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
-int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
-		struct blk_mq_tags **tags, unsigned int depth);
 void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
 		unsigned int size);
 void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (8 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
@ 2025-09-10  8:04 ` Yu Kuai
  2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe
  10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10  8:04 UTC (permalink / raw)
  To: axboe, nilay, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

The nr_requests documentation is still the removed single queue, remove
it and update to current blk-mq.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
 Documentation/ABI/stable/sysfs-block | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 0ddffc9133d0..0ed10aeff86b 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -603,16 +603,10 @@ Date:		July 2003
 Contact:	linux-block@vger.kernel.org
 Description:
 		[RW] This controls how many requests may be allocated in the
-		block layer for read or write requests. Note that the total
-		allocated number may be twice this amount, since it applies only
-		to reads or writes (not the accumulated sum).
-
-		To avoid priority inversion through request starvation, a
-		request queue maintains a separate request pool per each cgroup
-		when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
-		each such per-block-cgroup request pool.  IOW, if there are N
-		block cgroups, each request queue may have up to N request
-		pools, each independently regulated by nr_requests.
+		block layer. Noted this value only represents the quantity for a
+		single blk_mq_tags instance. The actual number for the entire
+		device depends on the hardware queue count, whether elevator is
+		enabled, and whether tags are shared.
 
 
 What:		/sys/block/<disk>/queue/nr_zones
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
@ 2025-09-10 10:05   ` Nilay Shroff
  0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:05 UTC (permalink / raw)
  To: Yu Kuai, axboe, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> request_queue->nr_requests can be changed by:
> 
> a) switch elevator by updating nr_hw_queues
> b) switch elevator by elevator sysfs attribute
> c) configue queue sysfs attribute nr_requests
> 
> Current lock order is:
> 
> 1) update_nr_hwq_lock, case a,b
> 2) freeze_queue
> 3) elevator_lock, case a,b,c
> 
> And update nr_requests is seriablized by elevator_lock() already,
> however, in the case c, we'll have to allocate new sched_tags if
> nr_requests grow, and do this with elevator_lock held and queue
> freezed has the risk of deadlock.
> 
> Hence use update_nr_hwq_lock instead, make it possible to allocate
> memory if tags grow, meanwhile also prevent nr_requests to be changed
> concurrently.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
@ 2025-09-10 10:08   ` Nilay Shroff
  0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:08 UTC (permalink / raw)
  To: Yu Kuai, axboe, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> No functional changes are intended, make code cleaner and prepare to fix
> the grow case in following patches.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
@ 2025-09-10 10:11   ` Nilay Shroff
  0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:11 UTC (permalink / raw)
  To: Yu Kuai, axboe, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi



On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Allocate and free sched_tags while queue is freezed can deadlock[1],
> this is a long term problem, hence allocate memory before freezing
> queue and free memory after queue is unfreezed.
> 
> [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
> Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests
  2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
                   ` (9 preceding siblings ...)
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
@ 2025-09-10 11:30 ` Jens Axboe
  10 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-10 11:30 UTC (permalink / raw)
  To: nilay, ming.lei, Yu Kuai
  Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
	johnny.chenyi


On Wed, 10 Sep 2025 16:04:35 +0800, Yu Kuai wrote:
> Changes from v1:
>  - add comments explaining accessing q->elevator without holding
>    elevator_lock in patch 4;
>  - add non-shared checking in patch 8;
>  - add review tag by Nilay, patch 1-5,7,9,10;
> 
> Yu Kuai (10):
>   blk-mq: remove useless checking in queue_requests_store()
>   blk-mq: remove useless checkings in blk_mq_update_nr_requests()
>   blk-mq: check invalid nr_requests in queue_requests_store()
>   blk-mq: convert to serialize updating nr_requests with
>     update_nr_hwq_lock
>   blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
>   blk-mq: split bitmap grow and resize case in
>     blk_mq_update_nr_requests()
>   blk-mq-sched: add new parameter nr_requests in
>     blk_mq_alloc_sched_tags()
>   blk-mq: fix potential deadlock while nr_requests grown
>   blk-mq: remove blk_mq_tag_update_depth()
>   blk-mq: fix stale nr_requests documentation
> 
> [...]

Applied, thanks!

[01/10] blk-mq: remove useless checking in queue_requests_store()
        commit: dc1dd13d44fa4e4d466476c0f3517c1230c237e4
[02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests()
        commit: 8bd7195fea6d9662aa3b32498a3828bfd9b63185
[03/10] blk-mq: check invalid nr_requests in queue_requests_store()
        commit: b46d4c447db76e36906ed59ebb9b3ef8f3383322
[04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
        commit: 626ff4f8ebcb7207f01e7810acb85812ccf06bd8
[05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
        commit: 7f2799c546dba9e12f9ff4d07936601e416c640d
[06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
        commit: e63200404477456ec60c62dd8b3b1092aba2e211
[07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags()
        commit: 6293e336f6d7d3f3415346ce34993b3398846166
[08/10] blk-mq: fix potential deadlock while nr_requests grown
        commit: b86433721f46d934940528f28d49c1dedb690df1
[09/10] blk-mq: remove blk_mq_tag_update_depth()
        commit: 9784041145796994f2b21f4c7e628d7c9db762f4
[10/10] blk-mq: fix stale nr_requests documentation
        commit: a75fe12fa2e2f96b619f25b8cda1fdef6d616ab1

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  2025-09-10  8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
@ 2025-10-14 13:05   ` Chris Mason
  2025-10-14 18:04     ` Nilay Shroff
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2025-10-14 13:05 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Chris Mason, axboe, nilay, ming.lei, linux-block, linux-kernel,
	yukuai3, yi.zhang, yangerkun, johnny.chenyi

Hi everyone,

My review automation flagged this one in linux-next, and it looks like a
valid bug to me:

On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:

> From: Yu Kuai <yukuai3@huawei.com>
> 
> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
> make sense to call into blk_mq_tag_update_depth() multiple times for the
> same tags.
> 

[ ... ]

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9b97f2f3f2c9..80c20700bce8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>
>  	blk_mq_quiesce_queue(q);
>
> -	queue_for_each_hw_ctx(q, hctx, i) {
> -		if (!hctx->tags)
> -			continue;
> -		/*
> -		 * If we're using an MQ scheduler, just update the scheduler
> -		 * queue depth. This is similar to what the old code would do.
> -		 */
> -		if (hctx->sched_tags) {
> -			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> -						      nr);
> -		} else {
> -			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
> -		}
> -		if (ret)
> -			goto out;
> -	}
> -
> -	q->nr_requests = nr;
> -	if (q->elevator && q->elevator->type->ops.depth_updated)
> -		q->elevator->type->ops.depth_updated(q);
> -
>  	if (blk_mq_is_shared_tags(set->flags)) {
>  		if (q->elevator)
>  			blk_mq_tag_update_sched_shared_tags(q);
                        ^^^^

Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?

In the old code, q->nr_requests was updated to nr before calling
blk_mq_tag_update_sched_shared_tags(). But in the new code, this
function is called while q->nr_requests still contains the old value.

Looking at blk_mq_tag_update_sched_shared_tags():

    void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
    {
        sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
                             q->nr_requests - q->tag_set->reserved_tags);
    }

It reads q->nr_requests to calculate the new sbitmap size. With the
reordering, this will resize the sbitmap to the old depth instead of
the new depth passed in nr.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  2025-10-14 13:05   ` Chris Mason
@ 2025-10-14 18:04     ` Nilay Shroff
  2025-10-15  0:56       ` Yu Kuai
  0 siblings, 1 reply; 18+ messages in thread
From: Nilay Shroff @ 2025-10-14 18:04 UTC (permalink / raw)
  To: Chris Mason, Yu Kuai
  Cc: axboe, ming.lei, linux-block, linux-kernel, yukuai3, yi.zhang,
	yangerkun, johnny.chenyi



On 10/14/25 6:35 PM, Chris Mason wrote:
> Hi everyone,
> 
> My review automation flagged this one in linux-next, and it looks like a
> valid bug to me:
> 
> On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:
> 
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
>> make sense to call into blk_mq_tag_update_depth() multiple times for the
>> same tags.
>>
> 
> [ ... ]
> 
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 9b97f2f3f2c9..80c20700bce8 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>
>>  	blk_mq_quiesce_queue(q);
>>
>> -	queue_for_each_hw_ctx(q, hctx, i) {
>> -		if (!hctx->tags)
>> -			continue;
>> -		/*
>> -		 * If we're using an MQ scheduler, just update the scheduler
>> -		 * queue depth. This is similar to what the old code would do.
>> -		 */
>> -		if (hctx->sched_tags) {
>> -			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>> -						      nr);
>> -		} else {
>> -			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
>> -		}
>> -		if (ret)
>> -			goto out;
>> -	}
>> -
>> -	q->nr_requests = nr;
>> -	if (q->elevator && q->elevator->type->ops.depth_updated)
>> -		q->elevator->type->ops.depth_updated(q);
>> -
>>  	if (blk_mq_is_shared_tags(set->flags)) {
>>  		if (q->elevator)
>>  			blk_mq_tag_update_sched_shared_tags(q);
>                         ^^^^
> 
> Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?
> 
> In the old code, q->nr_requests was updated to nr before calling
> blk_mq_tag_update_sched_shared_tags(). But in the new code, this
> function is called while q->nr_requests still contains the old value.
> 
> Looking at blk_mq_tag_update_sched_shared_tags():
> 
>     void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
>     {
>         sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
>                              q->nr_requests - q->tag_set->reserved_tags);
>     }
> 
> It reads q->nr_requests to calculate the new sbitmap size. With the
> reordering, this will resize the sbitmap to the old depth instead of
> the new depth passed in nr.
> 
Good catch! Yes, I think this needs to be fixed...

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  2025-10-14 18:04     ` Nilay Shroff
@ 2025-10-15  0:56       ` Yu Kuai
  0 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-10-15  0:56 UTC (permalink / raw)
  To: Nilay Shroff, Chris Mason, Yu Kuai
  Cc: axboe, ming.lei, linux-block, linux-kernel, yi.zhang, yangerkun,
	johnny.chenyi, yukuai (C)

Hi,

在 2025/10/15 2:04, Nilay Shroff 写道:
> 
> 
> On 10/14/25 6:35 PM, Chris Mason wrote:
>> Hi everyone,
>>
>> My review automation flagged this one in linux-next, and it looks like a
>> valid bug to me:
>>
>> On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
>>> make sense to call into blk_mq_tag_update_depth() multiple times for the
>>> same tags.
>>>
>>
>> [ ... ]
>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 9b97f2f3f2c9..80c20700bce8 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>>
>>>   	blk_mq_quiesce_queue(q);
>>>
>>> -	queue_for_each_hw_ctx(q, hctx, i) {
>>> -		if (!hctx->tags)
>>> -			continue;
>>> -		/*
>>> -		 * If we're using an MQ scheduler, just update the scheduler
>>> -		 * queue depth. This is similar to what the old code would do.
>>> -		 */
>>> -		if (hctx->sched_tags) {
>>> -			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>> -						      nr);
>>> -		} else {
>>> -			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
>>> -		}
>>> -		if (ret)
>>> -			goto out;
>>> -	}
>>> -
>>> -	q->nr_requests = nr;
>>> -	if (q->elevator && q->elevator->type->ops.depth_updated)
>>> -		q->elevator->type->ops.depth_updated(q);
>>> -
>>>   	if (blk_mq_is_shared_tags(set->flags)) {
>>>   		if (q->elevator)
>>>   			blk_mq_tag_update_sched_shared_tags(q);
>>                          ^^^^
>>
>> Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?
>>
>> In the old code, q->nr_requests was updated to nr before calling
>> blk_mq_tag_update_sched_shared_tags(). But in the new code, this
>> function is called while q->nr_requests still contains the old value.
>>
>> Looking at blk_mq_tag_update_sched_shared_tags():
>>
>>      void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
>>      {
>>          sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
>>                               q->nr_requests - q->tag_set->reserved_tags);
>>      }
>>
>> It reads q->nr_requests to calculate the new sbitmap size. With the
>> reordering, this will resize the sbitmap to the old depth instead of
>> the new depth passed in nr.
>>
> Good catch! Yes, I think this needs to be fixed...
> 

Yeah, I'll send a fix ASAP.

Thanks,
Kuai

> Thanks,
> --Nilay
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-10-15  0:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10  8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
2025-09-10 10:05   ` Nilay Shroff
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
2025-10-14 13:05   ` Chris Mason
2025-10-14 18:04     ` Nilay Shroff
2025-10-15  0:56       ` Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
2025-09-10 10:08   ` Nilay Shroff
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
2025-09-10 10:11   ` Nilay Shroff
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
2025-09-10  8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox