* [PATCH v3 0/2] blk-mq: fix update nr_requests regressions
@ 2025-08-21 6:06 Yu Kuai
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Yu Kuai @ 2025-08-21 6:06 UTC (permalink / raw)
To: yukuai3, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Changes in v3:
- call depth_updated() directly in init_sched() method in patch 1;
- fix typos in patch 2;
- add review for patch 2;
Changes in v2:
- instead of refactor and cleanups and fix updating nr_requests
thoroughly, fix the regression in patch 2 the easy way, and dealy
refactor and cleanups to next merge window.
patch 1 fix regression that elevator async_depth is not updated correctly
if nr_requests changes, first from error path and then for mq-deadline,
and recently for bfq and kyber.
patch 2 fix regression that if nr_requests grow, kernel will panic due
to tags double free.
Yu Kuai (2):
blk-mq: fix elevator depth_updated method
blk-mq: fix blk_mq_tags double free while nr_requests grown
block/bfq-iosched.c | 22 +++++-----------------
block/blk-mq-sched.h | 11 +++++++++++
block/blk-mq-tag.c | 1 +
block/blk-mq.c | 23 ++++++++++++-----------
block/elevator.h | 2 +-
block/kyber-iosched.c | 19 +++++++++----------
block/mq-deadline.c | 16 +++-------------
7 files changed, 42 insertions(+), 52 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] blk-mq: fix elevator depth_updated method
2025-08-21 6:06 [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
@ 2025-08-21 6:06 ` Yu Kuai
2025-08-21 12:21 ` Nilay Shroff
` (2 more replies)
2025-08-21 6:06 ` [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown Yu Kuai
2025-08-26 6:27 ` [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
2 siblings, 3 replies; 9+ messages in thread
From: Yu Kuai @ 2025-08-21 6:06 UTC (permalink / raw)
To: yukuai3, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Current depth_updated has some problems:
1) depth_updated() will be called for each hctx, while all elevators
will update async_depth for the disk level, this is not related to hctx;
2) In blk_mq_update_nr_requests(), if previous hctx update succeed and
this hctx update failed, q->nr_requests will not be updated, while
async_depth is already updated with new nr_reqeuests in previous
depth_updated();
3) All elevators are using q->nr_requests to calculate async_depth now,
however, q->nr_requests is still the old value when depth_updated() is
called from blk_mq_update_nr_requests();
Those problems are first from error path, then mq-deadline, and recently
for bfq and kyber, fix those problems by:
- pass in request_queue instead of hctx;
- move depth_updated() after q->nr_requests is updated in
blk_mq_update_nr_requests();
- add depth_updated() call inside init_sched() method to initialize
async_depth;
- remove init_hctx() method for mq-deadline and bfq that is useless now;
Fixes: 77f1e0a52d26 ("bfq: update internal depth state when queue depth changes")
Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
Fixes: 42e6c6ce03fd ("lib/sbitmap: convert shallow_depth from one word to the whole sbitmap")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/bfq-iosched.c | 22 +++++-----------------
block/blk-mq-sched.h | 11 +++++++++++
block/blk-mq.c | 23 ++++++++++++-----------
block/elevator.h | 2 +-
block/kyber-iosched.c | 19 +++++++++----------
block/mq-deadline.c | 16 +++-------------
6 files changed, 41 insertions(+), 52 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 50e51047e1fe..4a8d3d96bfe4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7109,9 +7109,10 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
* See the comments on bfq_limit_depth for the purpose of
* the depths set in the function. Return minimum shallow depth we'll use.
*/
-static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+static void bfq_depth_updated(struct request_queue *q)
{
- unsigned int nr_requests = bfqd->queue->nr_requests;
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ unsigned int nr_requests = q->nr_requests;
/*
* In-word depths if no bfq_queue is being weight-raised:
@@ -7143,21 +7144,8 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
bfqd->async_depths[1][0] = max((nr_requests * 3) >> 4, 1U);
/* no more than ~37% of tags for sync writes (~20% extra tags) */
bfqd->async_depths[1][1] = max((nr_requests * 6) >> 4, 1U);
-}
-
-static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
-{
- struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
- struct blk_mq_tags *tags = hctx->sched_tags;
- bfq_update_depths(bfqd, &tags->bitmap_tags);
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
-}
-
-static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
-{
- bfq_depth_updated(hctx);
- return 0;
+ blk_mq_set_min_shallow_depth(q, 1);
}
static void bfq_exit_queue(struct elevator_queue *e)
@@ -7369,6 +7357,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq)
goto out_free;
bfq_init_root_group(bfqd->root_group, bfqd);
bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
+ bfq_depth_updated(q);
/* We dispatch from request queue wide instead of hw queue */
blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
@@ -7628,7 +7617,6 @@ static struct elevator_type iosched_bfq_mq = {
.request_merged = bfq_request_merged,
.has_work = bfq_has_work,
.depth_updated = bfq_depth_updated,
- .init_hctx = bfq_init_hctx,
.init_sched = bfq_init_queue,
.exit_sched = bfq_exit_queue,
},
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index b554e1d55950..fe83187f41db 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -92,4 +92,15 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
}
+static inline void blk_mq_set_min_shallow_depth(struct request_queue *q,
+ unsigned int depth)
+{
+ struct blk_mq_hw_ctx *hctx;
+ unsigned long i;
+
+ queue_for_each_hw_ctx(q, hctx, i)
+ sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+ depth);
+}
+
#endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b67d6c02eceb..9c68749124c6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4951,20 +4951,21 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
false);
}
if (ret)
- break;
- if (q->elevator && q->elevator->type->ops.depth_updated)
- q->elevator->type->ops.depth_updated(hctx);
+ goto out;
}
- if (!ret) {
- q->nr_requests = nr;
- if (blk_mq_is_shared_tags(set->flags)) {
- if (q->elevator)
- blk_mq_tag_update_sched_shared_tags(q);
- else
- blk_mq_tag_resize_shared_tags(set, nr);
- }
+
+ q->nr_requests = nr;
+ if (q->elevator && q->elevator->type->ops.depth_updated)
+ q->elevator->type->ops.depth_updated(q);
+
+ if (blk_mq_is_shared_tags(set->flags)) {
+ if (q->elevator)
+ blk_mq_tag_update_sched_shared_tags(q);
+ else
+ blk_mq_tag_resize_shared_tags(set, nr);
}
+out:
blk_mq_unquiesce_queue(q);
return ret;
diff --git a/block/elevator.h b/block/elevator.h
index adc5c157e17e..c4d20155065e 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -37,7 +37,7 @@ struct elevator_mq_ops {
void (*exit_sched)(struct elevator_queue *);
int (*init_hctx)(struct blk_mq_hw_ctx *, unsigned int);
void (*exit_hctx)(struct blk_mq_hw_ctx *, unsigned int);
- void (*depth_updated)(struct blk_mq_hw_ctx *);
+ void (*depth_updated)(struct request_queue *);
bool (*allow_merge)(struct request_queue *, struct request *, struct bio *);
bool (*bio_merge)(struct request_queue *, struct bio *, unsigned int);
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 70cbc7b2deb4..18efd6ef2a2b 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -399,6 +399,14 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
return ERR_PTR(ret);
}
+static void kyber_depth_updated(struct request_queue *q)
+{
+ struct kyber_queue_data *kqd = q->elevator->elevator_data;
+
+ kqd->async_depth = q->nr_requests * KYBER_ASYNC_PERCENT / 100U;
+ blk_mq_set_min_shallow_depth(q, kqd->async_depth);
+}
+
static int kyber_init_sched(struct request_queue *q, struct elevator_queue *eq)
{
struct kyber_queue_data *kqd;
@@ -413,6 +421,7 @@ static int kyber_init_sched(struct request_queue *q, struct elevator_queue *eq)
eq->elevator_data = kqd;
q->elevator = eq;
+ kyber_depth_updated(q);
return 0;
}
@@ -440,15 +449,6 @@ static void kyber_ctx_queue_init(struct kyber_ctx_queue *kcq)
INIT_LIST_HEAD(&kcq->rq_list[i]);
}
-static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
-{
- struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
- struct blk_mq_tags *tags = hctx->sched_tags;
-
- kqd->async_depth = hctx->queue->nr_requests * KYBER_ASYNC_PERCENT / 100U;
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
-}
-
static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
struct kyber_hctx_data *khd;
@@ -493,7 +493,6 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
khd->batching = 0;
hctx->sched_data = khd;
- kyber_depth_updated(hctx);
return 0;
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b9b7cdf1d3c9..2e689b2c4021 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -507,22 +507,12 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
}
/* Called by blk_mq_update_nr_requests(). */
-static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
+static void dd_depth_updated(struct request_queue *q)
{
- struct request_queue *q = hctx->queue;
struct deadline_data *dd = q->elevator->elevator_data;
- struct blk_mq_tags *tags = hctx->sched_tags;
dd->async_depth = q->nr_requests;
-
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
-}
-
-/* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
-static int dd_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
-{
- dd_depth_updated(hctx);
- return 0;
+ blk_mq_set_min_shallow_depth(q, 1);
}
static void dd_exit_sched(struct elevator_queue *e)
@@ -587,6 +577,7 @@ static int dd_init_sched(struct request_queue *q, struct elevator_queue *eq)
blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
q->elevator = eq;
+ dd_depth_updated(q);
return 0;
}
@@ -1048,7 +1039,6 @@ static struct elevator_type mq_deadline = {
.has_work = dd_has_work,
.init_sched = dd_init_sched,
.exit_sched = dd_exit_sched,
- .init_hctx = dd_init_hctx,
},
#ifdef CONFIG_BLK_DEBUG_FS
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown
2025-08-21 6:06 [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
@ 2025-08-21 6:06 ` Yu Kuai
2025-08-21 14:57 ` Li Nan
2025-08-22 6:02 ` Hannes Reinecke
2025-08-26 6:27 ` [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
2 siblings, 2 replies; 9+ messages in thread
From: Yu Kuai @ 2025-08-21 6:06 UTC (permalink / raw)
To: yukuai3, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
In the case user trigger tags grow by queue sysfs attribute nr_requests,
hctx->sched_tags will be freed directly and replaced with a new
allocated tags, see blk_mq_tag_update_depth().
The problem is that hctx->sched_tags is from elevator->et->tags, while
et->tags is still the freed tags, hence later elevator exit will try to
free the tags again, causing kernel panic.
Fix this problem by replacing et->tags with new allocated tags as well.
Noted there are still some long term problems that will require some
refactor to be fixed thoroughly[1].
[1] https://lore.kernel.org/all/20250815080216.410665-1-yukuai1@huaweicloud.com/
Fixes: f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Nilay Shroff<nilay@linux.ibm.com>
---
block/blk-mq-tag.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d880c50629d6..5cffa5668d0c 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -622,6 +622,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
return -ENOMEM;
blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
+ hctx->queue->elevator->et->tags[hctx->queue_num] = new;
*tagsptr = new;
} else {
/*
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] blk-mq: fix elevator depth_updated method
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
@ 2025-08-21 12:21 ` Nilay Shroff
2025-08-21 14:57 ` Li Nan
2025-08-22 6:00 ` Hannes Reinecke
2 siblings, 0 replies; 9+ messages in thread
From: Nilay Shroff @ 2025-08-21 12:21 UTC (permalink / raw)
To: Yu Kuai, yukuai3, axboe, bvanassche, ming.lei, hare
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi
On 8/21/25 11:36 AM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Current depth_updated has some problems:
>
> 1) depth_updated() will be called for each hctx, while all elevators
> will update async_depth for the disk level, this is not related to hctx;
> 2) In blk_mq_update_nr_requests(), if previous hctx update succeed and
> this hctx update failed, q->nr_requests will not be updated, while
> async_depth is already updated with new nr_reqeuests in previous
> depth_updated();
> 3) All elevators are using q->nr_requests to calculate async_depth now,
> however, q->nr_requests is still the old value when depth_updated() is
> called from blk_mq_update_nr_requests();
>
> Those problems are first from error path, then mq-deadline, and recently
> for bfq and kyber, fix those problems by:
>
> - pass in request_queue instead of hctx;
> - move depth_updated() after q->nr_requests is updated in
> blk_mq_update_nr_requests();
> - add depth_updated() call inside init_sched() method to initialize
> async_depth;
> - remove init_hctx() method for mq-deadline and bfq that is useless now;
>
> Fixes: 77f1e0a52d26 ("bfq: update internal depth state when queue depth changes")
> Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
> Fixes: 42e6c6ce03fd ("lib/sbitmap: convert shallow_depth from one word to the whole sbitmap")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] blk-mq: fix elevator depth_updated method
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
2025-08-21 12:21 ` Nilay Shroff
@ 2025-08-21 14:57 ` Li Nan
2025-08-22 6:00 ` Hannes Reinecke
2 siblings, 0 replies; 9+ messages in thread
From: Li Nan @ 2025-08-21 14:57 UTC (permalink / raw)
To: Yu Kuai, yukuai3, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi
在 2025/8/21 14:06, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Current depth_updated has some problems:
>
> 1) depth_updated() will be called for each hctx, while all elevators
> will update async_depth for the disk level, this is not related to hctx;
> 2) In blk_mq_update_nr_requests(), if previous hctx update succeed and
> this hctx update failed, q->nr_requests will not be updated, while
> async_depth is already updated with new nr_reqeuests in previous
> depth_updated();
> 3) All elevators are using q->nr_requests to calculate async_depth now,
> however, q->nr_requests is still the old value when depth_updated() is
> called from blk_mq_update_nr_requests();
>
> Those problems are first from error path, then mq-deadline, and recently
> for bfq and kyber, fix those problems by:
>
> - pass in request_queue instead of hctx;
> - move depth_updated() after q->nr_requests is updated in
> blk_mq_update_nr_requests();
> - add depth_updated() call inside init_sched() method to initialize
> async_depth;
> - remove init_hctx() method for mq-deadline and bfq that is useless now;
>
> Fixes: 77f1e0a52d26 ("bfq: update internal depth state when queue depth changes")
> Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
> Fixes: 42e6c6ce03fd ("lib/sbitmap: convert shallow_depth from one word to the whole sbitmap")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> block/bfq-iosched.c | 22 +++++-----------------
> block/blk-mq-sched.h | 11 +++++++++++
> block/blk-mq.c | 23 ++++++++++++-----------
> block/elevator.h | 2 +-
> block/kyber-iosched.c | 19 +++++++++----------
> block/mq-deadline.c | 16 +++-------------
> 6 files changed, 41 insertions(+), 52 deletions(-)
>
LGTM
Reviewed-by: Li Nan <linan122@huawei.com>
--
Thanks,
Nan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown
2025-08-21 6:06 ` [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown Yu Kuai
@ 2025-08-21 14:57 ` Li Nan
2025-08-22 6:02 ` Hannes Reinecke
1 sibling, 0 replies; 9+ messages in thread
From: Li Nan @ 2025-08-21 14:57 UTC (permalink / raw)
To: Yu Kuai, yukuai3, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi
在 2025/8/21 14:06, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> In the case user trigger tags grow by queue sysfs attribute nr_requests,
> hctx->sched_tags will be freed directly and replaced with a new
> allocated tags, see blk_mq_tag_update_depth().
>
> The problem is that hctx->sched_tags is from elevator->et->tags, while
> et->tags is still the freed tags, hence later elevator exit will try to
> free the tags again, causing kernel panic.
>
> Fix this problem by replacing et->tags with new allocated tags as well.
>
> Noted there are still some long term problems that will require some
> refactor to be fixed thoroughly[1].
>
> [1] https://lore.kernel.org/all/20250815080216.410665-1-yukuai1@huaweicloud.com/
> Fixes: f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Nilay Shroff<nilay@linux.ibm.com>
> ---
LGTM
Reviewed-by: Li Nan <linan122@huawei.com>
--
Thanks,
Nan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] blk-mq: fix elevator depth_updated method
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
2025-08-21 12:21 ` Nilay Shroff
2025-08-21 14:57 ` Li Nan
@ 2025-08-22 6:00 ` Hannes Reinecke
2 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2025-08-22 6:00 UTC (permalink / raw)
To: Yu Kuai, yukuai3, axboe, bvanassche, ming.lei, nilay
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi
On 8/21/25 08:06, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Current depth_updated has some problems:
>
> 1) depth_updated() will be called for each hctx, while all elevators
> will update async_depth for the disk level, this is not related to hctx;
> 2) In blk_mq_update_nr_requests(), if previous hctx update succeed and
> this hctx update failed, q->nr_requests will not be updated, while
> async_depth is already updated with new nr_reqeuests in previous
> depth_updated();
> 3) All elevators are using q->nr_requests to calculate async_depth now,
> however, q->nr_requests is still the old value when depth_updated() is
> called from blk_mq_update_nr_requests();
>
> Those problems are first from error path, then mq-deadline, and recently
> for bfq and kyber, fix those problems by:
>
> - pass in request_queue instead of hctx;
> - move depth_updated() after q->nr_requests is updated in
> blk_mq_update_nr_requests();
> - add depth_updated() call inside init_sched() method to initialize
> async_depth;
> - remove init_hctx() method for mq-deadline and bfq that is useless now;
>
> Fixes: 77f1e0a52d26 ("bfq: update internal depth state when queue depth changes")
> Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
> Fixes: 42e6c6ce03fd ("lib/sbitmap: convert shallow_depth from one word to the whole sbitmap")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> block/bfq-iosched.c | 22 +++++-----------------
> block/blk-mq-sched.h | 11 +++++++++++
> block/blk-mq.c | 23 ++++++++++++-----------
> block/elevator.h | 2 +-
> block/kyber-iosched.c | 19 +++++++++----------
> block/mq-deadline.c | 16 +++-------------
> 6 files changed, 41 insertions(+), 52 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown
2025-08-21 6:06 ` [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown Yu Kuai
2025-08-21 14:57 ` Li Nan
@ 2025-08-22 6:02 ` Hannes Reinecke
1 sibling, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2025-08-22 6:02 UTC (permalink / raw)
To: Yu Kuai, yukuai3, axboe, bvanassche, ming.lei, nilay
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi
On 8/21/25 08:06, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> In the case user trigger tags grow by queue sysfs attribute nr_requests,
> hctx->sched_tags will be freed directly and replaced with a new
> allocated tags, see blk_mq_tag_update_depth().
>
> The problem is that hctx->sched_tags is from elevator->et->tags, while
> et->tags is still the freed tags, hence later elevator exit will try to
> free the tags again, causing kernel panic.
>
> Fix this problem by replacing et->tags with new allocated tags as well.
>
> Noted there are still some long term problems that will require some
> refactor to be fixed thoroughly[1].
>
> [1] https://lore.kernel.org/all/20250815080216.410665-1-yukuai1@huaweicloud.com/
> Fixes: f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Nilay Shroff<nilay@linux.ibm.com>
> ---
> block/blk-mq-tag.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d880c50629d6..5cffa5668d0c 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -622,6 +622,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
> return -ENOMEM;
>
> blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
> + hctx->queue->elevator->et->tags[hctx->queue_num] = new;
> *tagsptr = new;
> } else {
> /*
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] blk-mq: fix update nr_requests regressions
2025-08-21 6:06 [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
2025-08-21 6:06 ` [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown Yu Kuai
@ 2025-08-26 6:27 ` Yu Kuai
2 siblings, 0 replies; 9+ messages in thread
From: Yu Kuai @ 2025-08-26 6:27 UTC (permalink / raw)
To: Yu Kuai, axboe, bvanassche, ming.lei, nilay, hare
Cc: linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi,
yukuai (C)
Hi, Jens
在 2025/08/21 14:06, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Changes in v3:
> - call depth_updated() directly in init_sched() method in patch 1;
> - fix typos in patch 2;
> - add review for patch 2;
> Changes in v2:
> - instead of refactor and cleanups and fix updating nr_requests
> thoroughly, fix the regression in patch 2 the easy way, and dealy
> refactor and cleanups to next merge window.
>
> patch 1 fix regression that elevator async_depth is not updated correctly
> if nr_requests changes, first from error path and then for mq-deadline,
> and recently for bfq and kyber.
>
> patch 2 fix regression that if nr_requests grow, kernel will panic due
> to tags double free.
>
> Yu Kuai (2):
> blk-mq: fix elevator depth_updated method
> blk-mq: fix blk_mq_tags double free while nr_requests grown
>
> block/bfq-iosched.c | 22 +++++-----------------
> block/blk-mq-sched.h | 11 +++++++++++
> block/blk-mq-tag.c | 1 +
> block/blk-mq.c | 23 ++++++++++++-----------
> block/elevator.h | 2 +-
> block/kyber-iosched.c | 19 +++++++++----------
> block/mq-deadline.c | 16 +++-------------
> 7 files changed, 42 insertions(+), 52 deletions(-)
>
Friendly ping, please consider this set in this merge window.
BTW, I see that for-6.18/block branch was created, however, I have
a pending set[1] for the next merge window that will have conflicts with
this set, not sure if you want to rebase for-6.18/block with block-6.17
or handle conflicts later for 6.18-rc1.
[1]
https://lore.kernel.org/all/20250815080216.410665-1-yukuai1@huaweicloud.com/
Thanks,
Kuai
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-26 6:27 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 6:06 [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
2025-08-21 6:06 ` [PATCH v3 1/2] blk-mq: fix elevator depth_updated method Yu Kuai
2025-08-21 12:21 ` Nilay Shroff
2025-08-21 14:57 ` Li Nan
2025-08-22 6:00 ` Hannes Reinecke
2025-08-21 6:06 ` [PATCH v3 2/2] blk-mq: fix blk_mq_tags double free while nr_requests grown Yu Kuai
2025-08-21 14:57 ` Li Nan
2025-08-22 6:02 ` Hannes Reinecke
2025-08-26 6:27 ` [PATCH v3 0/2] blk-mq: fix update nr_requests regressions Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).