* [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests
@ 2025-09-10 8:04 Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
` (10 more replies)
0 siblings, 11 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Changes from v1:
- add comments explaining accessing q->elevator without holding
elevator_lock in patch 4;
- add non-shared checking in patch 8;
- add review tag by Nilay, patch 1-5,7,9,10;
Yu Kuai (10):
blk-mq: remove useless checking in queue_requests_store()
blk-mq: remove useless checkings in blk_mq_update_nr_requests()
blk-mq: check invalid nr_requests in queue_requests_store()
blk-mq: convert to serialize updating nr_requests with
update_nr_hwq_lock
blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
blk-mq: split bitmap grow and resize case in
blk_mq_update_nr_requests()
blk-mq-sched: add new parameter nr_requests in
blk_mq_alloc_sched_tags()
blk-mq: fix potential deadlock while nr_requests grown
blk-mq: remove blk_mq_tag_update_depth()
blk-mq: fix stale nr_requests documentation
Documentation/ABI/stable/sysfs-block | 14 ++----
block/blk-mq-sched.c | 14 ++----
block/blk-mq-sched.h | 2 +-
block/blk-mq-tag.c | 53 ---------------------
block/blk-mq.c | 71 ++++++++++++++--------------
block/blk-mq.h | 18 +++++--
block/blk-sysfs.c | 60 ++++++++++++++++++-----
block/elevator.c | 3 +-
8 files changed, 112 insertions(+), 123 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
` (9 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
blk_mq_queue_attr_visible() already checked queue_is_mq(), no need to
check this again in queue_requests_store().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-sysfs.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index c94b8b6ab024..1ffa65feca4f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -69,9 +69,6 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
unsigned int memflags;
struct request_queue *q = disk->queue;
- if (!queue_is_mq(q))
- return -EINVAL;
-
ret = queue_var_store(&nr, page, count);
if (ret < 0)
return ret;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
` (8 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
1) queue_requests_store() is the only caller of
blk_mq_update_nr_requests(), where queue is already freezed, no need to
check mq_freeze_depth;
2) q->tag_set must be set for request based device, and queue_is_mq() is
already checked in blk_mq_queue_attr_visible(), no need to check
q->tag_set.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 31cc743ffad7..55ccc9f4435d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4930,21 +4930,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
{
struct blk_mq_tag_set *set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
- int ret;
+ int ret = 0;
unsigned long i;
- if (WARN_ON_ONCE(!q->mq_freeze_depth))
- return -EINVAL;
-
- if (!set)
- return -EINVAL;
-
if (q->nr_requests == nr)
return 0;
blk_mq_quiesce_queue(q);
- ret = 0;
queue_for_each_hw_ctx(q, hctx, i) {
if (!hctx->tags)
continue;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
` (7 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
queue_requests_store() is the only caller of
blk_mq_update_nr_requests(), and blk_mq_update_nr_requests() is the
only caller of blk_mq_tag_update_depth(), however, they all have
checkings for nr_requests input by user.
Make code cleaner by moving all the checkings to the top function:
1) nr_requests > reserved tags;
2) if there is elevator, 4 <= nr_requests <= 2048;
3) if elevator is none, 4 <= nr_requests <= tag_set->queue_depth;
Meanwhile, case 2 is the only case tags can grow and -ENOMEM might be
returned.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq-tag.c | 16 +---------------
block/blk-mq.c | 8 ++------
block/blk-mq.h | 2 +-
block/blk-sysfs.c | 13 +++++++++++++
4 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 086c67849e06..56f9bc839000 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -610,14 +610,10 @@ void blk_mq_free_tags(struct blk_mq_tag_set *set, struct blk_mq_tags *tags)
}
int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
- struct blk_mq_tags **tagsptr, unsigned int tdepth,
- bool can_grow)
+ struct blk_mq_tags **tagsptr, unsigned int tdepth)
{
struct blk_mq_tags *tags = *tagsptr;
- if (tdepth <= tags->nr_reserved_tags)
- return -EINVAL;
-
/*
* If we are allowed to grow beyond the original size, allocate
* a new set of tags before freeing the old one.
@@ -626,16 +622,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tag_set *set = hctx->queue->tag_set;
struct blk_mq_tags *new;
- if (!can_grow)
- return -EINVAL;
-
- /*
- * We need some sort of upper limit, set it high enough that
- * no valid use cases should require more.
- */
- if (tdepth > MAX_SCHED_RQ)
- return -EINVAL;
-
/*
* Only the sbitmap needs resizing since we allocated the max
* initially.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 55ccc9f4435d..9b97f2f3f2c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4933,9 +4933,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
int ret = 0;
unsigned long i;
- if (q->nr_requests == nr)
- return 0;
-
blk_mq_quiesce_queue(q);
queue_for_each_hw_ctx(q, hctx, i) {
@@ -4947,10 +4944,9 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
*/
if (hctx->sched_tags) {
ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
- nr, true);
+ nr);
} else {
- ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
- false);
+ ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
}
if (ret)
goto out;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b96a753809ab..5d42c7d3a952 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -171,7 +171,7 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
unsigned int tag);
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
- struct blk_mq_tags **tags, unsigned int depth, bool can_grow);
+ struct blk_mq_tags **tags, unsigned int depth);
void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
unsigned int size);
void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1ffa65feca4f..f99519f7a820 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -75,12 +75,25 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
memflags = blk_mq_freeze_queue(q);
mutex_lock(&q->elevator_lock);
+
+ if (nr == q->nr_requests)
+ goto unlock;
+
if (nr < BLKDEV_MIN_RQ)
nr = BLKDEV_MIN_RQ;
+ if (nr <= q->tag_set->reserved_tags ||
+ (q->elevator && nr > MAX_SCHED_RQ) ||
+ (!q->elevator && nr > q->tag_set->queue_depth)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
err = blk_mq_update_nr_requests(disk->queue, nr);
if (err)
ret = err;
+
+unlock:
mutex_unlock(&q->elevator_lock);
blk_mq_unfreeze_queue(q, memflags);
return ret;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (2 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 10:05 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
` (6 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
request_queue->nr_requests can be changed by:
a) switch elevator by updating nr_hw_queues
b) switch elevator by elevator sysfs attribute
c) configue queue sysfs attribute nr_requests
Current lock order is:
1) update_nr_hwq_lock, case a,b
2) freeze_queue
3) elevator_lock, case a,b,c
And update nr_requests is seriablized by elevator_lock() already,
however, in the case c, we'll have to allocate new sched_tags if
nr_requests grow, and do this with elevator_lock held and queue
freezed has the risk of deadlock.
Hence use update_nr_hwq_lock instead, make it possible to allocate
memory if tags grow, meanwhile also prevent nr_requests to be changed
concurrently.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-sysfs.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f99519f7a820..ff66d1b47169 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -68,13 +68,17 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
int ret, err;
unsigned int memflags;
struct request_queue *q = disk->queue;
+ struct blk_mq_tag_set *set = q->tag_set;
ret = queue_var_store(&nr, page, count);
if (ret < 0)
return ret;
- memflags = blk_mq_freeze_queue(q);
- mutex_lock(&q->elevator_lock);
+ /*
+ * Serialize updating nr_requests with concurrent queue_requests_store()
+ * and switching elevator.
+ */
+ down_write(&set->update_nr_hwq_lock);
if (nr == q->nr_requests)
goto unlock;
@@ -82,20 +86,31 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
if (nr < BLKDEV_MIN_RQ)
nr = BLKDEV_MIN_RQ;
- if (nr <= q->tag_set->reserved_tags ||
+ /*
+ * Switching elevator is protected by update_nr_hwq_lock:
+ * - read lock is held from elevator sysfs attribute;
+ * - write lock is held from updating nr_hw_queues;
+ * Hence it's safe to access q->elevator here with write lock held.
+ */
+ if (nr <= set->reserved_tags ||
(q->elevator && nr > MAX_SCHED_RQ) ||
- (!q->elevator && nr > q->tag_set->queue_depth)) {
+ (!q->elevator && nr > set->queue_depth)) {
ret = -EINVAL;
goto unlock;
}
+ memflags = blk_mq_freeze_queue(q);
+ mutex_lock(&q->elevator_lock);
+
err = blk_mq_update_nr_requests(disk->queue, nr);
if (err)
ret = err;
-unlock:
mutex_unlock(&q->elevator_lock);
blk_mq_unfreeze_queue(q, memflags);
+
+unlock:
+ up_write(&set->update_nr_hwq_lock);
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (3 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-10-14 13:05 ` Chris Mason
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
` (5 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
make sense to call into blk_mq_tag_update_depth() multiple times for the
same tags.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq-tag.c | 7 -------
block/blk-mq.c | 43 ++++++++++++++++++++++---------------------
2 files changed, 22 insertions(+), 28 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 56f9bc839000..936b68273cad 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -622,13 +622,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tag_set *set = hctx->queue->tag_set;
struct blk_mq_tags *new;
- /*
- * Only the sbitmap needs resizing since we allocated the max
- * initially.
- */
- if (blk_mq_is_shared_tags(set->flags))
- return 0;
-
new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
if (!new)
return -ENOMEM;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9b97f2f3f2c9..80c20700bce8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
blk_mq_quiesce_queue(q);
- queue_for_each_hw_ctx(q, hctx, i) {
- if (!hctx->tags)
- continue;
- /*
- * If we're using an MQ scheduler, just update the scheduler
- * queue depth. This is similar to what the old code would do.
- */
- if (hctx->sched_tags) {
- ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
- nr);
- } else {
- ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
- }
- if (ret)
- goto out;
- }
-
- q->nr_requests = nr;
- if (q->elevator && q->elevator->type->ops.depth_updated)
- q->elevator->type->ops.depth_updated(q);
-
if (blk_mq_is_shared_tags(set->flags)) {
if (q->elevator)
blk_mq_tag_update_sched_shared_tags(q);
else
blk_mq_tag_resize_shared_tags(set, nr);
+ } else {
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (!hctx->tags)
+ continue;
+ /*
+ * If we're using an MQ scheduler, just update the
+ * scheduler queue depth. This is similar to what the
+ * old code would do.
+ */
+ if (hctx->sched_tags)
+ ret = blk_mq_tag_update_depth(hctx,
+ &hctx->sched_tags, nr);
+ else
+ ret = blk_mq_tag_update_depth(hctx,
+ &hctx->tags, nr);
+ if (ret)
+ goto out;
+ }
}
+ q->nr_requests = nr;
+ if (q->elevator && q->elevator->type->ops.depth_updated)
+ q->elevator->type->ops.depth_updated(q);
+
out:
blk_mq_unquiesce_queue(q);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (4 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 10:08 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
` (4 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
No functional changes are intended, make code cleaner and prepare to fix
the grow case in following patches.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-mq.c | 39 +++++++++++++++++++++++++++------------
1 file changed, 27 insertions(+), 12 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 80c20700bce8..299aac458185 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4936,25 +4936,40 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
blk_mq_quiesce_queue(q);
if (blk_mq_is_shared_tags(set->flags)) {
+ /*
+ * Shared tags, for sched tags, we allocate max initially hence
+ * tags can't grow, see blk_mq_alloc_sched_tags().
+ */
if (q->elevator)
blk_mq_tag_update_sched_shared_tags(q);
else
blk_mq_tag_resize_shared_tags(set, nr);
- } else {
+ } else if (!q->elevator) {
+ /*
+ * Non-shared hardware tags, nr is already checked from
+ * queue_requests_store() and tags can't grow.
+ */
queue_for_each_hw_ctx(q, hctx, i) {
if (!hctx->tags)
continue;
- /*
- * If we're using an MQ scheduler, just update the
- * scheduler queue depth. This is similar to what the
- * old code would do.
- */
- if (hctx->sched_tags)
- ret = blk_mq_tag_update_depth(hctx,
- &hctx->sched_tags, nr);
- else
- ret = blk_mq_tag_update_depth(hctx,
- &hctx->tags, nr);
+ sbitmap_queue_resize(&hctx->tags->bitmap_tags,
+ nr - hctx->tags->nr_reserved_tags);
+ }
+ } else if (nr <= q->elevator->et->nr_requests) {
+ /* Non-shared sched tags, and tags don't grow. */
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (!hctx->sched_tags)
+ continue;
+ sbitmap_queue_resize(&hctx->sched_tags->bitmap_tags,
+ nr - hctx->sched_tags->nr_reserved_tags);
+ }
+ } else {
+ /* Non-shared sched tags, and tags grow */
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (!hctx->sched_tags)
+ continue;
+ ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
+ nr);
if (ret)
goto out;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (5 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
` (3 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
This helper only support to allocate the default number of requests,
add a new parameter to support specific number of requests.
Prepare to fix potential deadlock in the case nr_requests grow.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq-sched.c | 14 +++++---------
block/blk-mq-sched.h | 2 +-
block/blk-mq.h | 11 +++++++++++
block/elevator.c | 3 ++-
4 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e2ce4a28e6c9..d06bb137a743 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -454,7 +454,7 @@ void blk_mq_free_sched_tags_batch(struct xarray *et_table,
}
struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
- unsigned int nr_hw_queues)
+ unsigned int nr_hw_queues, unsigned int nr_requests)
{
unsigned int nr_tags;
int i;
@@ -470,13 +470,8 @@ struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
nr_tags * sizeof(struct blk_mq_tags *), gfp);
if (!et)
return NULL;
- /*
- * Default to double of smaller one between hw queue_depth and
- * 128, since we don't split into sync/async like the old code
- * did. Additionally, this is a per-hw queue depth.
- */
- et->nr_requests = 2 * min_t(unsigned int, set->queue_depth,
- BLKDEV_DEFAULT_RQ);
+
+ et->nr_requests = nr_requests;
et->nr_hw_queues = nr_hw_queues;
if (blk_mq_is_shared_tags(set->flags)) {
@@ -521,7 +516,8 @@ int blk_mq_alloc_sched_tags_batch(struct xarray *et_table,
* concurrently.
*/
if (q->elevator) {
- et = blk_mq_alloc_sched_tags(set, nr_hw_queues);
+ et = blk_mq_alloc_sched_tags(set, nr_hw_queues,
+ blk_mq_default_nr_requests(set));
if (!et)
goto out_unwind;
if (xa_insert(et_table, q->id, et, gfp))
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index fe83187f41db..8e21a6b1415d 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -24,7 +24,7 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
void blk_mq_sched_free_rqs(struct request_queue *q);
struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set,
- unsigned int nr_hw_queues);
+ unsigned int nr_hw_queues, unsigned int nr_requests);
int blk_mq_alloc_sched_tags_batch(struct xarray *et_table,
struct blk_mq_tag_set *set, unsigned int nr_hw_queues);
void blk_mq_free_sched_tags(struct elevator_tags *et,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 5d42c7d3a952..3a1d4c37d1bc 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -109,6 +109,17 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(blk_opf_t opf,
return ctx->hctxs[blk_mq_get_hctx_type(opf)];
}
+/*
+ * Default to double of smaller one between hw queue_depth and
+ * 128, since we don't split into sync/async like the old code
+ * did. Additionally, this is a per-hw queue depth.
+ */
+static inline unsigned int blk_mq_default_nr_requests(
+ struct blk_mq_tag_set *set)
+{
+ return 2 * min_t(unsigned int, set->queue_depth, BLKDEV_DEFAULT_RQ);
+}
+
/*
* sysfs helpers
*/
diff --git a/block/elevator.c b/block/elevator.c
index fe96c6f4753c..e2ebfbf107b3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -669,7 +669,8 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
lockdep_assert_held(&set->update_nr_hwq_lock);
if (strncmp(ctx->name, "none", 4)) {
- ctx->et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues);
+ ctx->et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues,
+ blk_mq_default_nr_requests(set));
if (!ctx->et)
return -ENOMEM;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (6 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 10:11 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
` (2 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Allocate and free sched_tags while queue is freezed can deadlock[1],
this is a long term problem, hence allocate memory before freezing
queue and free memory after queue is unfreezed.
[1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-mq.c | 22 +++++++++-------------
block/blk-mq.h | 5 ++++-
block/blk-sysfs.c | 29 +++++++++++++++++++++--------
3 files changed, 34 insertions(+), 22 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 299aac458185..4ccf11cadf8c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4926,11 +4926,13 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
}
EXPORT_SYMBOL(blk_mq_free_tag_set);
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
+struct elevator_tags *blk_mq_update_nr_requests(struct request_queue *q,
+ struct elevator_tags *et,
+ unsigned int nr)
{
struct blk_mq_tag_set *set = q->tag_set;
+ struct elevator_tags *old_et = NULL;
struct blk_mq_hw_ctx *hctx;
- int ret = 0;
unsigned long i;
blk_mq_quiesce_queue(q);
@@ -4965,24 +4967,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
}
} else {
/* Non-shared sched tags, and tags grow */
- queue_for_each_hw_ctx(q, hctx, i) {
- if (!hctx->sched_tags)
- continue;
- ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
- nr);
- if (ret)
- goto out;
- }
+ queue_for_each_hw_ctx(q, hctx, i)
+ hctx->sched_tags = et->tags[i];
+ old_et = q->elevator->et;
+ q->elevator->et = et;
}
q->nr_requests = nr;
if (q->elevator && q->elevator->type->ops.depth_updated)
q->elevator->type->ops.depth_updated(q);
-out:
blk_mq_unquiesce_queue(q);
-
- return ret;
+ return old_et;
}
/*
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3a1d4c37d1bc..0ef96f510477 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -6,6 +6,7 @@
#include "blk-stat.h"
struct blk_mq_tag_set;
+struct elevator_tags;
struct blk_mq_ctxs {
struct kobject kobj;
@@ -45,7 +46,9 @@ void blk_mq_submit_bio(struct bio *bio);
int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob,
unsigned int flags);
void blk_mq_exit_queue(struct request_queue *q);
-int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
+struct elevator_tags *blk_mq_update_nr_requests(struct request_queue *q,
+ struct elevator_tags *tags,
+ unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *,
bool);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ff66d1b47169..76c47fe9b8d6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -64,11 +64,12 @@ static ssize_t queue_requests_show(struct gendisk *disk, char *page)
static ssize_t
queue_requests_store(struct gendisk *disk, const char *page, size_t count)
{
- unsigned long nr;
- int ret, err;
- unsigned int memflags;
struct request_queue *q = disk->queue;
struct blk_mq_tag_set *set = q->tag_set;
+ struct elevator_tags *et = NULL;
+ unsigned int memflags;
+ unsigned long nr;
+ int ret;
ret = queue_var_store(&nr, page, count);
if (ret < 0)
@@ -99,16 +100,28 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
goto unlock;
}
+ if (!blk_mq_is_shared_tags(set->flags) && q->elevator &&
+ nr > q->elevator->et->nr_requests) {
+ /*
+ * Tags will grow, allocate memory before freezing queue to
+ * prevent deadlock.
+ */
+ et = blk_mq_alloc_sched_tags(set, q->nr_hw_queues, nr);
+ if (!et) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+ }
+
memflags = blk_mq_freeze_queue(q);
mutex_lock(&q->elevator_lock);
-
- err = blk_mq_update_nr_requests(disk->queue, nr);
- if (err)
- ret = err;
-
+ et = blk_mq_update_nr_requests(q, et, nr);
mutex_unlock(&q->elevator_lock);
blk_mq_unfreeze_queue(q, memflags);
+ if (et)
+ blk_mq_free_sched_tags(et, set);
+
unlock:
up_write(&set->update_nr_hwq_lock);
return ret;
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth()
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (7 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
This helper is not used now.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq-tag.c | 32 --------------------------------
block/blk-mq.h | 2 --
2 files changed, 34 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 936b68273cad..a63d21a4aab4 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -609,38 +609,6 @@ void blk_mq_free_tags(struct blk_mq_tag_set *set, struct blk_mq_tags *tags)
call_srcu(&set->tags_srcu, &tags->rcu_head, blk_mq_free_tags_callback);
}
-int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
- struct blk_mq_tags **tagsptr, unsigned int tdepth)
-{
- struct blk_mq_tags *tags = *tagsptr;
-
- /*
- * If we are allowed to grow beyond the original size, allocate
- * a new set of tags before freeing the old one.
- */
- if (tdepth > tags->nr_tags) {
- struct blk_mq_tag_set *set = hctx->queue->tag_set;
- struct blk_mq_tags *new;
-
- new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
- if (!new)
- return -ENOMEM;
-
- blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
- hctx->queue->elevator->et->tags[hctx->queue_num] = new;
- *tagsptr = new;
- } else {
- /*
- * Don't need (or can't) update reserved tags here, they
- * remain static and should never need resizing.
- */
- sbitmap_queue_resize(&tags->bitmap_tags,
- tdepth - tags->nr_reserved_tags);
- }
-
- return 0;
-}
-
void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set, unsigned int size)
{
struct blk_mq_tags *tags = set->shared_tags;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0ef96f510477..af42dc018808 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -184,8 +184,6 @@ unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
unsigned int tag);
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
-int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
- struct blk_mq_tags **tags, unsigned int depth);
void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
unsigned int size);
void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (8 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
@ 2025-09-10 8:04 ` Yu Kuai
2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-09-10 8:04 UTC (permalink / raw)
To: axboe, nilay, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
The nr_requests documentation is still the removed single queue, remove
it and update to current blk-mq.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
---
Documentation/ABI/stable/sysfs-block | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 0ddffc9133d0..0ed10aeff86b 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -603,16 +603,10 @@ Date: July 2003
Contact: linux-block@vger.kernel.org
Description:
[RW] This controls how many requests may be allocated in the
- block layer for read or write requests. Note that the total
- allocated number may be twice this amount, since it applies only
- to reads or writes (not the accumulated sum).
-
- To avoid priority inversion through request starvation, a
- request queue maintains a separate request pool per each cgroup
- when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
- each such per-block-cgroup request pool. IOW, if there are N
- block cgroups, each request queue may have up to N request
- pools, each independently regulated by nr_requests.
+ block layer. Noted this value only represents the quantity for a
+ single blk_mq_tags instance. The actual number for the entire
+ device depends on the hardware queue count, whether elevator is
+ enabled, and whether tags are shared.
What: /sys/block/<disk>/queue/nr_zones
--
2.39.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
@ 2025-09-10 10:05 ` Nilay Shroff
0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:05 UTC (permalink / raw)
To: Yu Kuai, axboe, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> request_queue->nr_requests can be changed by:
>
> a) switch elevator by updating nr_hw_queues
> b) switch elevator by elevator sysfs attribute
> c) configue queue sysfs attribute nr_requests
>
> Current lock order is:
>
> 1) update_nr_hwq_lock, case a,b
> 2) freeze_queue
> 3) elevator_lock, case a,b,c
>
> And update nr_requests is seriablized by elevator_lock() already,
> however, in the case c, we'll have to allocate new sched_tags if
> nr_requests grow, and do this with elevator_lock held and queue
> freezed has the risk of deadlock.
>
> Hence use update_nr_hwq_lock instead, make it possible to allocate
> memory if tags grow, meanwhile also prevent nr_requests to be changed
> concurrently.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
@ 2025-09-10 10:08 ` Nilay Shroff
0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:08 UTC (permalink / raw)
To: Yu Kuai, axboe, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> No functional changes are intended, make code cleaner and prepare to fix
> the grow case in following patches.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
@ 2025-09-10 10:11 ` Nilay Shroff
0 siblings, 0 replies; 18+ messages in thread
From: Nilay Shroff @ 2025-09-10 10:11 UTC (permalink / raw)
To: Yu Kuai, axboe, ming.lei
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
On 9/10/25 1:34 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Allocate and free sched_tags while queue is freezed can deadlock[1],
> this is a long term problem, hence allocate memory before freezing
> queue and free memory after queue is unfreezed.
>
> [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
> Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
` (9 preceding siblings ...)
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
@ 2025-09-10 11:30 ` Jens Axboe
10 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-10 11:30 UTC (permalink / raw)
To: nilay, ming.lei, Yu Kuai
Cc: linux-block, linux-kernel, yukuai3, yi.zhang, yangerkun,
johnny.chenyi
On Wed, 10 Sep 2025 16:04:35 +0800, Yu Kuai wrote:
> Changes from v1:
> - add comments explaining accessing q->elevator without holding
> elevator_lock in patch 4;
> - add non-shared checking in patch 8;
> - add review tag by Nilay, patch 1-5,7,9,10;
>
> Yu Kuai (10):
> blk-mq: remove useless checking in queue_requests_store()
> blk-mq: remove useless checkings in blk_mq_update_nr_requests()
> blk-mq: check invalid nr_requests in queue_requests_store()
> blk-mq: convert to serialize updating nr_requests with
> update_nr_hwq_lock
> blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
> blk-mq: split bitmap grow and resize case in
> blk_mq_update_nr_requests()
> blk-mq-sched: add new parameter nr_requests in
> blk_mq_alloc_sched_tags()
> blk-mq: fix potential deadlock while nr_requests grown
> blk-mq: remove blk_mq_tag_update_depth()
> blk-mq: fix stale nr_requests documentation
>
> [...]
Applied, thanks!
[01/10] blk-mq: remove useless checking in queue_requests_store()
commit: dc1dd13d44fa4e4d466476c0f3517c1230c237e4
[02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests()
commit: 8bd7195fea6d9662aa3b32498a3828bfd9b63185
[03/10] blk-mq: check invalid nr_requests in queue_requests_store()
commit: b46d4c447db76e36906ed59ebb9b3ef8f3383322
[04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
commit: 626ff4f8ebcb7207f01e7810acb85812ccf06bd8
[05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
commit: 7f2799c546dba9e12f9ff4d07936601e416c640d
[06/10] blk-mq: split bitmap grow and resize case in blk_mq_update_nr_requests()
commit: e63200404477456ec60c62dd8b3b1092aba2e211
[07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags()
commit: 6293e336f6d7d3f3415346ce34993b3398846166
[08/10] blk-mq: fix potential deadlock while nr_requests grown
commit: b86433721f46d934940528f28d49c1dedb690df1
[09/10] blk-mq: remove blk_mq_tag_update_depth()
commit: 9784041145796994f2b21f4c7e628d7c9db762f4
[10/10] blk-mq: fix stale nr_requests documentation
commit: a75fe12fa2e2f96b619f25b8cda1fdef6d616ab1
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
@ 2025-10-14 13:05 ` Chris Mason
2025-10-14 18:04 ` Nilay Shroff
0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2025-10-14 13:05 UTC (permalink / raw)
To: Yu Kuai
Cc: Chris Mason, axboe, nilay, ming.lei, linux-block, linux-kernel,
yukuai3, yi.zhang, yangerkun, johnny.chenyi
Hi everyone,
My review automation flagged this one in linux-next, and it looks like a
valid bug to me:
On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
> make sense to call into blk_mq_tag_update_depth() multiple times for the
> same tags.
>
[ ... ]
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9b97f2f3f2c9..80c20700bce8 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>
> blk_mq_quiesce_queue(q);
>
> - queue_for_each_hw_ctx(q, hctx, i) {
> - if (!hctx->tags)
> - continue;
> - /*
> - * If we're using an MQ scheduler, just update the scheduler
> - * queue depth. This is similar to what the old code would do.
> - */
> - if (hctx->sched_tags) {
> - ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> - nr);
> - } else {
> - ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
> - }
> - if (ret)
> - goto out;
> - }
> -
> - q->nr_requests = nr;
> - if (q->elevator && q->elevator->type->ops.depth_updated)
> - q->elevator->type->ops.depth_updated(q);
> -
> if (blk_mq_is_shared_tags(set->flags)) {
> if (q->elevator)
> blk_mq_tag_update_sched_shared_tags(q);
^^^^
Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?
In the old code, q->nr_requests was updated to nr before calling
blk_mq_tag_update_sched_shared_tags(). But in the new code, this
function is called while q->nr_requests still contains the old value.
Looking at blk_mq_tag_update_sched_shared_tags():
void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
{
sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
q->nr_requests - q->tag_set->reserved_tags);
}
It reads q->nr_requests to calculate the new sbitmap size. With the
reordering, this will resize the sbitmap to the old depth instead of
the new depth passed in nr.
-chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
2025-10-14 13:05 ` Chris Mason
@ 2025-10-14 18:04 ` Nilay Shroff
2025-10-15 0:56 ` Yu Kuai
0 siblings, 1 reply; 18+ messages in thread
From: Nilay Shroff @ 2025-10-14 18:04 UTC (permalink / raw)
To: Chris Mason, Yu Kuai
Cc: axboe, ming.lei, linux-block, linux-kernel, yukuai3, yi.zhang,
yangerkun, johnny.chenyi
On 10/14/25 6:35 PM, Chris Mason wrote:
> Hi everyone,
>
> My review automation flagged this one in linux-next, and it looks like a
> valid bug to me:
>
> On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
>> make sense to call into blk_mq_tag_update_depth() multiple times for the
>> same tags.
>>
>
> [ ... ]
>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 9b97f2f3f2c9..80c20700bce8 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>
>> blk_mq_quiesce_queue(q);
>>
>> - queue_for_each_hw_ctx(q, hctx, i) {
>> - if (!hctx->tags)
>> - continue;
>> - /*
>> - * If we're using an MQ scheduler, just update the scheduler
>> - * queue depth. This is similar to what the old code would do.
>> - */
>> - if (hctx->sched_tags) {
>> - ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>> - nr);
>> - } else {
>> - ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
>> - }
>> - if (ret)
>> - goto out;
>> - }
>> -
>> - q->nr_requests = nr;
>> - if (q->elevator && q->elevator->type->ops.depth_updated)
>> - q->elevator->type->ops.depth_updated(q);
>> -
>> if (blk_mq_is_shared_tags(set->flags)) {
>> if (q->elevator)
>> blk_mq_tag_update_sched_shared_tags(q);
> ^^^^
>
> Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?
>
> In the old code, q->nr_requests was updated to nr before calling
> blk_mq_tag_update_sched_shared_tags(). But in the new code, this
> function is called while q->nr_requests still contains the old value.
>
> Looking at blk_mq_tag_update_sched_shared_tags():
>
> void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
> {
> sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
> q->nr_requests - q->tag_set->reserved_tags);
> }
>
> It reads q->nr_requests to calculate the new sbitmap size. With the
> reordering, this will resize the sbitmap to the old depth instead of
> the new depth passed in nr.
>
Good catch! Yes, I think this needs to be fixed...
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
2025-10-14 18:04 ` Nilay Shroff
@ 2025-10-15 0:56 ` Yu Kuai
0 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-10-15 0:56 UTC (permalink / raw)
To: Nilay Shroff, Chris Mason, Yu Kuai
Cc: axboe, ming.lei, linux-block, linux-kernel, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
Hi,
在 2025/10/15 2:04, Nilay Shroff 写道:
>
>
> On 10/14/25 6:35 PM, Chris Mason wrote:
>> Hi everyone,
>>
>> My review automation flagged this one in linux-next, and it looks like a
>> valid bug to me:
>>
>> On Wed, 10 Sep 2025 16:04:40 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> For shared tags case, all hctx->sched_tags/tags are the same, it doesn't
>>> make sense to call into blk_mq_tag_update_depth() multiple times for the
>>> same tags.
>>>
>>
>> [ ... ]
>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 9b97f2f3f2c9..80c20700bce8 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -4935,34 +4935,35 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>>
>>> blk_mq_quiesce_queue(q);
>>>
>>> - queue_for_each_hw_ctx(q, hctx, i) {
>>> - if (!hctx->tags)
>>> - continue;
>>> - /*
>>> - * If we're using an MQ scheduler, just update the scheduler
>>> - * queue depth. This is similar to what the old code would do.
>>> - */
>>> - if (hctx->sched_tags) {
>>> - ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>> - nr);
>>> - } else {
>>> - ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr);
>>> - }
>>> - if (ret)
>>> - goto out;
>>> - }
>>> -
>>> - q->nr_requests = nr;
>>> - if (q->elevator && q->elevator->type->ops.depth_updated)
>>> - q->elevator->type->ops.depth_updated(q);
>>> -
>>> if (blk_mq_is_shared_tags(set->flags)) {
>>> if (q->elevator)
>>> blk_mq_tag_update_sched_shared_tags(q);
>> ^^^^
>>
>> Does blk_mq_tag_update_sched_shared_tags() use the wrong value here?
>>
>> In the old code, q->nr_requests was updated to nr before calling
>> blk_mq_tag_update_sched_shared_tags(). But in the new code, this
>> function is called while q->nr_requests still contains the old value.
>>
>> Looking at blk_mq_tag_update_sched_shared_tags():
>>
>> void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
>> {
>> sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
>> q->nr_requests - q->tag_set->reserved_tags);
>> }
>>
>> It reads q->nr_requests to calculate the new sbitmap size. With the
>> reordering, this will resize the sbitmap to the old depth instead of
>> the new depth passed in nr.
>>
> Good catch! Yes, I think this needs to be fixed...
>
Yeah, I'll send a fix ASAP.
Thanks,
Kuai
> Thanks,
> --Nilay
>
>
> .
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-10-15 0:56 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 8:04 [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
2025-09-10 10:05 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
2025-10-14 13:05 ` Chris Mason
2025-10-14 18:04 ` Nilay Shroff
2025-10-15 0:56 ` Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
2025-09-10 10:08 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
2025-09-10 10:11 ` Nilay Shroff
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
2025-09-10 8:04 ` [PATCH v2 for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
2025-09-10 11:30 ` [PATCH v2 for-6.18/block 00/10] blk-mq: cleanup and fixes for updating nr_requests Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox