* [PATCHv6 0/3] block: move sched_tags allocation/de-allocation outside of locking context
@ 2025-06-30 5:21 Nilay Shroff
2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Nilay Shroff @ 2025-06-30 5:21 UTC (permalink / raw)
To: linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce
Hi,
There have been a few reports[1] indicating potential lockdep warnings due
to a lock dependency from the percpu allocator to the elevator lock. This
patch series aims to eliminate that dependency.
The series consists of three patches:
The first patch is preparatory patch and just move elevator queue
allocation logic from ->init_sched into blk_mq_init_sched.
The second patch in the series restructures sched_tags allocation and
deallocation during elevator update/switch operations to ensure these
actions are performed entirely outside the ->freeze_lock and ->elevator_
lock. This eliminates the percpu allocator’s lock dependency on the
elevator and freeze lock during scheduler transitions.
The third patch introduces batch allocation and deallocation helpers for
sched_tags. These helpers are used during __blk_mq_update_nr_hw_queues()
to decouple sched_tags memory management from both the elevator and freeze
locks, addressing the lockdep concerns in the nr_hw_queues update path.
[1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
Changes since v5:
- Fixed smatch warning reported by kernel test robot here:
https://lore.kernel.org/all/202506300509.2S1tygch-lkp@intel.com/
Link to v5: https://lore.kernel.org/all/20250627175544.1063910-1-nilay@linux.ibm.com/
Changes since v4:
- Define a local Xarray variable in __blk_mq_update_nr_hw_queues to store
sched_tags, instead of storing it in an Xarray defined in 'struct elevator_tags'
(Ming Lei)
Link to v4: https://lore.kernel.org/all/20250624131716.630465-1-nilay@linux.ibm.com/
Changes since v3:
- Further split the patchset into three patch series so that we can
have a separate patch for sched_tags batch allocation/deallocation
(Ming Lei)
- Use Xarray to store and load the sched_tags (Ming Lei)
- Unexport elevator_alloc() as we no longer need to use it outside
of block layer core (hch)
- unwind the sched_tags allocation and free tags when we it fails in
the middle of allocation (hch)
- Move struct elevator_queue header from commin header to elevator.c
as there's no user of it outside elevator.c (Ming Lei, hch)
Link to v3: https://lore.kernel.org/all/20250616173233.3803824-1-nilay@linux.ibm.com/
Change since v2:
- Split the patch into a two patch series. The first patch updates
->init_sched elevator API change and second patch handles the sched
tags allocation/de-allocation logic (Ming Lei)
- Address sched tags allocation/deallocation logic while running in the
context of nr_hw_queue update so that we can handle all possible
scenarios in a single patchest (Ming Lei)
Link to v2: https://lore.kernel.org/all/20250528123638.1029700-1-nilay@linux.ibm.com/
Changes since v1:
- As the lifetime of elevator queue and sched tags are same, allocate
and move sched tags under struct elevator_queue (Ming Lei)
Link to v1: https://lore.kernel.org/all/20250520103425.1259712-1-nilay@linux.ibm.com/
Nilay Shroff (3):
block: move elevator queue allocation logic into blk_mq_init_sched
block: fix lockdep warning caused by lock dependency in
elv_iosched_store
block: fix potential deadlock while running nr_hw_queue update
block/bfq-iosched.c | 13 +--
block/blk-mq-sched.c | 220 ++++++++++++++++++++++++++++--------------
block/blk-mq-sched.h | 12 ++-
block/blk-mq.c | 11 ++-
block/blk.h | 2 +-
block/elevator.c | 50 ++++++++--
block/elevator.h | 16 ++-
block/kyber-iosched.c | 11 +--
block/mq-deadline.c | 14 +--
9 files changed, 234 insertions(+), 115 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched 2025-06-30 5:21 [PATCHv6 0/3] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff @ 2025-06-30 5:21 ` Nilay Shroff 2025-06-30 6:13 ` Christoph Hellwig 2025-06-30 6:17 ` Hannes Reinecke 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff 2 siblings, 2 replies; 15+ messages in thread From: Nilay Shroff @ 2025-06-30 5:21 UTC (permalink / raw) To: linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce In preparation for allocating sched_tags before freezing the request queue and acquiring ->elevator_lock, move the elevator queue allocation logic from the elevator ops ->init_sched callback into blk_mq_init_sched. As elevator_alloc is now only invoked from block layer core, we don't need to export it, so unexport elevator_alloc function. This refactoring provides a centralized location for elevator queue initialization, which makes it easier to store pre-allocated sched_tags in the struct elevator_queue during later changes. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- block/bfq-iosched.c | 13 +++---------- block/blk-mq-sched.c | 11 ++++++++--- block/elevator.c | 1 - block/elevator.h | 2 +- block/kyber-iosched.c | 11 ++--------- block/mq-deadline.c | 14 ++------------ 6 files changed, 16 insertions(+), 36 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 0cb1e9873aab..fd26dc1901b0 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -7232,22 +7232,16 @@ static void bfq_init_root_group(struct bfq_group *root_group, root_group->sched_data.bfq_class_idle_last_service = jiffies; } -static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) +static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq) { struct bfq_data *bfqd; - struct elevator_queue *eq; unsigned int i; struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges; - eq = elevator_alloc(q, e); - if (!eq) - return -ENOMEM; - bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node); - if (!bfqd) { - kobject_put(&eq->kobj); + if (!bfqd) return -ENOMEM; - } + eq->elevator_data = bfqd; spin_lock_irq(&q->queue_lock); @@ -7405,7 +7399,6 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) out_free: kfree(bfqd); - kobject_put(&eq->kobj); return -ENOMEM; } diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 55a0fd105147..359e0704e09b 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -475,10 +475,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth, BLKDEV_DEFAULT_RQ); + eq = elevator_alloc(q, e); + if (!eq) + return -ENOMEM; + if (blk_mq_is_shared_tags(flags)) { ret = blk_mq_init_sched_shared_tags(q); if (ret) - return ret; + goto err_put_elevator; } queue_for_each_hw_ctx(q, hctx, i) { @@ -487,7 +491,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) goto err_free_map_and_rqs; } - ret = e->ops.init_sched(q, e); + ret = e->ops.init_sched(q, eq); if (ret) goto err_free_map_and_rqs; @@ -508,7 +512,8 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) err_free_map_and_rqs: blk_mq_sched_free_rqs(q); blk_mq_sched_tags_teardown(q, flags); - +err_put_elevator: + kobject_put(&eq->kobj); q->elevator = NULL; return ret; } diff --git a/block/elevator.c b/block/elevator.c index ab22542e6cf0..770874040f79 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -148,7 +148,6 @@ struct elevator_queue *elevator_alloc(struct request_queue *q, return eq; } -EXPORT_SYMBOL(elevator_alloc); static void elevator_release(struct kobject *kobj) { diff --git a/block/elevator.h b/block/elevator.h index a07ce773a38f..a4de5f9ad790 100644 --- a/block/elevator.h +++ b/block/elevator.h @@ -24,7 +24,7 @@ struct blk_mq_alloc_data; struct blk_mq_hw_ctx; struct elevator_mq_ops { - int (*init_sched)(struct request_queue *, struct elevator_type *); + int (*init_sched)(struct request_queue *, struct elevator_queue *); void (*exit_sched)(struct elevator_queue *); int (*init_hctx)(struct blk_mq_hw_ctx *, unsigned int); void (*exit_hctx)(struct blk_mq_hw_ctx *, unsigned int); diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index 4dba8405bd01..7b6832cb3a8d 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -402,20 +402,13 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q) return ERR_PTR(ret); } -static int kyber_init_sched(struct request_queue *q, struct elevator_type *e) +static int kyber_init_sched(struct request_queue *q, struct elevator_queue *eq) { struct kyber_queue_data *kqd; - struct elevator_queue *eq; - - eq = elevator_alloc(q, e); - if (!eq) - return -ENOMEM; kqd = kyber_queue_data_alloc(q); - if (IS_ERR(kqd)) { - kobject_put(&eq->kobj); + if (IS_ERR(kqd)) return PTR_ERR(kqd); - } blk_stat_enable_accounting(q); diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 2edf1cac06d5..7b6caf30e00a 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -568,20 +568,14 @@ static void dd_exit_sched(struct elevator_queue *e) /* * initialize elevator private data (deadline_data). */ -static int dd_init_sched(struct request_queue *q, struct elevator_type *e) +static int dd_init_sched(struct request_queue *q, struct elevator_queue *eq) { struct deadline_data *dd; - struct elevator_queue *eq; enum dd_prio prio; - int ret = -ENOMEM; - - eq = elevator_alloc(q, e); - if (!eq) - return ret; dd = kzalloc_node(sizeof(*dd), GFP_KERNEL, q->node); if (!dd) - goto put_eq; + return -ENOMEM; eq->elevator_data = dd; @@ -608,10 +602,6 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e) q->elevator = eq; return 0; - -put_eq: - kobject_put(&eq->kobj); - return ret; } /* -- 2.50.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched 2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff @ 2025-06-30 6:13 ` Christoph Hellwig 2025-06-30 6:17 ` Hannes Reinecke 1 sibling, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-06-30 6:13 UTC (permalink / raw) To: Nilay Shroff; +Cc: linux-block, ming.lei, axboe, sth, lkp, gjoyce Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched 2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff 2025-06-30 6:13 ` Christoph Hellwig @ 2025-06-30 6:17 ` Hannes Reinecke 1 sibling, 0 replies; 15+ messages in thread From: Hannes Reinecke @ 2025-06-30 6:17 UTC (permalink / raw) To: Nilay Shroff, linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce On 6/30/25 07:21, Nilay Shroff wrote: > In preparation for allocating sched_tags before freezing the request > queue and acquiring ->elevator_lock, move the elevator queue allocation > logic from the elevator ops ->init_sched callback into blk_mq_init_sched. > As elevator_alloc is now only invoked from block layer core, we don't > need to export it, so unexport elevator_alloc function. > > This refactoring provides a centralized location for elevator queue > initialization, which makes it easier to store pre-allocated sched_tags > in the struct elevator_queue during later changes. > > Reviewed-by: Ming Lei <ming.lei@redhat.com> > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > block/bfq-iosched.c | 13 +++---------- > block/blk-mq-sched.c | 11 ++++++++--- > block/elevator.c | 1 - > block/elevator.h | 2 +- > block/kyber-iosched.c | 11 ++--------- > block/mq-deadline.c | 14 ++------------ > 6 files changed, 16 insertions(+), 36 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store 2025-06-30 5:21 [PATCHv6 0/3] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff 2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff @ 2025-06-30 5:21 ` Nilay Shroff 2025-06-30 6:15 ` Christoph Hellwig ` (2 more replies) 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff 2 siblings, 3 replies; 15+ messages in thread From: Nilay Shroff @ 2025-06-30 5:21 UTC (permalink / raw) To: linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce Recent lockdep reports [1] have revealed a potential deadlock caused by a lock dependency between the percpu allocator lock and the elevator lock. This issue can be avoided by ensuring that the allocation and release of scheduler tags (sched_tags) are performed outside the elevator lock. Furthermore, the queue does not need to be remain frozen during these operations. To address this, move all sched_tags allocations and deallocations outside of both the ->elevator_lock and the ->freeze_lock. Since the lifetime of the elevator queue and its associated sched_tags is closely tied, the allocated sched_tags are now stored in the elevator queue structure. Then, during the actual elevator switch (which runs under ->freeze_lock and ->elevator_lock), the pre-allocated sched_tags are assigned to the appropriate q->hctx. Once the elevator switch is complete and the locks are released, the old elevator queue and its associated sched_tags are freed. This commit specifically addresses the allocation/deallocation of sched_ tags during elevator switching. Note that sched_tags may also be allocated in other contexts, such as during nr_hw_queues updates. Supporting that use case will require batch allocation/deallocation, which will be handled in a follow-up patch. This restructuring ensures that sched_tags memory management occurs entirely outside of the ->elevator_lock and ->freeze_lock context, eliminating the lock dependency problem seen during scheduler updates. [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ Reported-by: Stefan Haberland <sth@linux.ibm.com> Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- block/blk-mq-sched.c | 155 +++++++++++++++++++++++-------------------- block/blk-mq-sched.h | 8 ++- block/elevator.c | 47 +++++++++++-- block/elevator.h | 14 +++- 4 files changed, 144 insertions(+), 80 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 359e0704e09b..2d6d1ebdd8fb 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -374,64 +374,17 @@ bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, } EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); -static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q, - struct blk_mq_hw_ctx *hctx, - unsigned int hctx_idx) -{ - if (blk_mq_is_shared_tags(q->tag_set->flags)) { - hctx->sched_tags = q->sched_shared_tags; - return 0; - } - - hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx, - q->nr_requests); - - if (!hctx->sched_tags) - return -ENOMEM; - return 0; -} - -static void blk_mq_exit_sched_shared_tags(struct request_queue *queue) -{ - blk_mq_free_rq_map(queue->sched_shared_tags); - queue->sched_shared_tags = NULL; -} - /* called in queue's release handler, tagset has gone away */ static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags) { struct blk_mq_hw_ctx *hctx; unsigned long i; - queue_for_each_hw_ctx(q, hctx, i) { - if (hctx->sched_tags) { - if (!blk_mq_is_shared_tags(flags)) - blk_mq_free_rq_map(hctx->sched_tags); - hctx->sched_tags = NULL; - } - } + queue_for_each_hw_ctx(q, hctx, i) + hctx->sched_tags = NULL; if (blk_mq_is_shared_tags(flags)) - blk_mq_exit_sched_shared_tags(q); -} - -static int blk_mq_init_sched_shared_tags(struct request_queue *queue) -{ - struct blk_mq_tag_set *set = queue->tag_set; - - /* - * Set initial depth at max so that we don't need to reallocate for - * updating nr_requests. - */ - queue->sched_shared_tags = blk_mq_alloc_map_and_rqs(set, - BLK_MQ_NO_HCTX_IDX, - MAX_SCHED_RQ); - if (!queue->sched_shared_tags) - return -ENOMEM; - - blk_mq_tag_update_sched_shared_tags(queue); - - return 0; + q->sched_shared_tags = NULL; } void blk_mq_sched_reg_debugfs(struct request_queue *q) @@ -458,8 +411,75 @@ void blk_mq_sched_unreg_debugfs(struct request_queue *q) mutex_unlock(&q->debugfs_mutex); } +void blk_mq_free_sched_tags(struct elevator_tags *et, + struct blk_mq_tag_set *set) +{ + unsigned long i; + + /* Shared tags are stored at index 0 in @tags. */ + if (blk_mq_is_shared_tags(set->flags)) + blk_mq_free_map_and_rqs(set, et->tags[0], BLK_MQ_NO_HCTX_IDX); + else { + for (i = 0; i < et->nr_hw_queues; i++) + blk_mq_free_map_and_rqs(set, et->tags[i], i); + } + + kfree(et); +} + +struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, + unsigned int nr_hw_queues) +{ + unsigned int nr_tags; + int i; + struct elevator_tags *et; + gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; + + if (blk_mq_is_shared_tags(set->flags)) + nr_tags = 1; + else + nr_tags = nr_hw_queues; + + et = kmalloc(sizeof(struct elevator_tags) + + nr_tags * sizeof(struct blk_mq_tags *), gfp); + if (!et) + return NULL; + /* + * Default to double of smaller one between hw queue_depth and + * 128, since we don't split into sync/async like the old code + * did. Additionally, this is a per-hw queue depth. + */ + et->nr_requests = 2 * min_t(unsigned int, set->queue_depth, + BLKDEV_DEFAULT_RQ); + et->nr_hw_queues = nr_hw_queues; + + if (blk_mq_is_shared_tags(set->flags)) { + /* Shared tags are stored at index 0 in @tags. */ + et->tags[0] = blk_mq_alloc_map_and_rqs(set, BLK_MQ_NO_HCTX_IDX, + MAX_SCHED_RQ); + if (!et->tags[0]) + goto out; + } else { + for (i = 0; i < et->nr_hw_queues; i++) { + et->tags[i] = blk_mq_alloc_map_and_rqs(set, i, + et->nr_requests); + if (!et->tags[i]) + goto out_unwind; + } + } + + return et; +out_unwind: + while (--i >= 0) + blk_mq_free_map_and_rqs(set, et->tags[i], i); +out: + kfree(et); + return NULL; +} + /* caller must have a reference to @e, will grab another one if successful */ -int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) +int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e, + struct elevator_tags *et) { unsigned int flags = q->tag_set->flags; struct blk_mq_hw_ctx *hctx; @@ -467,40 +487,33 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) unsigned long i; int ret; - /* - * Default to double of smaller one between hw queue_depth and 128, - * since we don't split into sync/async like the old code did. - * Additionally, this is a per-hw queue depth. - */ - q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth, - BLKDEV_DEFAULT_RQ); - - eq = elevator_alloc(q, e); + eq = elevator_alloc(q, e, et); if (!eq) return -ENOMEM; + q->nr_requests = et->nr_requests; + if (blk_mq_is_shared_tags(flags)) { - ret = blk_mq_init_sched_shared_tags(q); - if (ret) - goto err_put_elevator; + /* Shared tags are stored at index 0 in @et->tags. */ + q->sched_shared_tags = et->tags[0]; + blk_mq_tag_update_sched_shared_tags(q); } queue_for_each_hw_ctx(q, hctx, i) { - ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i); - if (ret) - goto err_free_map_and_rqs; + if (blk_mq_is_shared_tags(flags)) + hctx->sched_tags = q->sched_shared_tags; + else + hctx->sched_tags = et->tags[i]; } ret = e->ops.init_sched(q, eq); if (ret) - goto err_free_map_and_rqs; + goto out; queue_for_each_hw_ctx(q, hctx, i) { if (e->ops.init_hctx) { ret = e->ops.init_hctx(hctx, i); if (ret) { - eq = q->elevator; - blk_mq_sched_free_rqs(q); blk_mq_exit_sched(q, eq); kobject_put(&eq->kobj); return ret; @@ -509,10 +522,8 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) } return 0; -err_free_map_and_rqs: - blk_mq_sched_free_rqs(q); +out: blk_mq_sched_tags_teardown(q, flags); -err_put_elevator: kobject_put(&eq->kobj); q->elevator = NULL; return ret; diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 1326526bb733..0cde00cd1c47 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -18,10 +18,16 @@ void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx); -int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e); +int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e, + struct elevator_tags *et); void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e); void blk_mq_sched_free_rqs(struct request_queue *q); +struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, + unsigned int nr_hw_queues); +void blk_mq_free_sched_tags(struct elevator_tags *et, + struct blk_mq_tag_set *set); + static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) diff --git a/block/elevator.c b/block/elevator.c index 770874040f79..50f4b78efe66 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -54,6 +54,8 @@ struct elv_change_ctx { struct elevator_queue *old; /* for registering new elevator */ struct elevator_queue *new; + /* holds sched tags data */ + struct elevator_tags *et; }; static DEFINE_SPINLOCK(elv_list_lock); @@ -132,7 +134,7 @@ static struct elevator_type *elevator_find_get(const char *name) static const struct kobj_type elv_ktype; struct elevator_queue *elevator_alloc(struct request_queue *q, - struct elevator_type *e) + struct elevator_type *e, struct elevator_tags *et) { struct elevator_queue *eq; @@ -145,6 +147,7 @@ struct elevator_queue *elevator_alloc(struct request_queue *q, kobject_init(&eq->kobj, &elv_ktype); mutex_init(&eq->sysfs_lock); hash_init(eq->hash); + eq->et = et; return eq; } @@ -165,7 +168,6 @@ static void elevator_exit(struct request_queue *q) lockdep_assert_held(&q->elevator_lock); ioc_clear_queue(q); - blk_mq_sched_free_rqs(q); mutex_lock(&e->sysfs_lock); blk_mq_exit_sched(q, e); @@ -591,7 +593,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx) } if (new_e) { - ret = blk_mq_init_sched(q, new_e); + ret = blk_mq_init_sched(q, new_e, ctx->et); if (ret) goto out_unfreeze; ctx->new = q->elevator; @@ -626,8 +628,10 @@ static void elv_exit_and_release(struct request_queue *q) elevator_exit(q); mutex_unlock(&q->elevator_lock); blk_mq_unfreeze_queue(q, memflags); - if (e) + if (e) { + blk_mq_free_sched_tags(e->et, q->tag_set); kobject_put(&e->kobj); + } } static int elevator_change_done(struct request_queue *q, @@ -640,6 +644,7 @@ static int elevator_change_done(struct request_queue *q, &ctx->old->flags); elv_unregister_queue(q, ctx->old); + blk_mq_free_sched_tags(ctx->old->et, q->tag_set); kobject_put(&ctx->old->kobj); if (enable_wbt) wbt_enable_default(q->disk); @@ -658,9 +663,16 @@ static int elevator_change_done(struct request_queue *q, static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) { unsigned int memflags; + struct blk_mq_tag_set *set = q->tag_set; int ret = 0; - lockdep_assert_held(&q->tag_set->update_nr_hwq_lock); + lockdep_assert_held(&set->update_nr_hwq_lock); + + if (strncmp(ctx->name, "none", 4)) { + ctx->et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues); + if (!ctx->et) + return -ENOMEM; + } memflags = blk_mq_freeze_queue(q); /* @@ -680,6 +692,11 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) blk_mq_unfreeze_queue(q, memflags); if (!ret) ret = elevator_change_done(q, ctx); + /* + * Free sched tags if it's allocated but we couldn't switch elevator. + */ + if (ctx->et && !ctx->new) + blk_mq_free_sched_tags(ctx->et, set); return ret; } @@ -690,11 +707,26 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) */ void elv_update_nr_hw_queues(struct request_queue *q) { + struct blk_mq_tag_set *set = q->tag_set; struct elv_change_ctx ctx = {}; int ret = -ENODEV; WARN_ON_ONCE(q->mq_freeze_depth == 0); + /* + * Accessing q->elevator without holding q->elevator_lock is safe here + * because nr_hw_queue update is protected by set->update_nr_hwq_lock + * in the writer context. So, scheduler update/switch code (which + * acquires same lock in the reader context) can't run concurrently. + */ + if (q->elevator) { + ctx.et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues); + if (!ctx.et) { + WARN_ON_ONCE(1); + return; + } + } + mutex_lock(&q->elevator_lock); if (q->elevator && !blk_queue_dying(q) && blk_queue_registered(q)) { ctx.name = q->elevator->type->elevator_name; @@ -706,6 +738,11 @@ void elv_update_nr_hw_queues(struct request_queue *q) blk_mq_unfreeze_queue_nomemrestore(q); if (!ret) WARN_ON_ONCE(elevator_change_done(q, &ctx)); + /* + * Free sched tags if it's allocated but we couldn't switch elevator. + */ + if (ctx.et && !ctx.new) + blk_mq_free_sched_tags(ctx.et, set); } /* diff --git a/block/elevator.h b/block/elevator.h index a4de5f9ad790..adc5c157e17e 100644 --- a/block/elevator.h +++ b/block/elevator.h @@ -23,6 +23,15 @@ enum elv_merge { struct blk_mq_alloc_data; struct blk_mq_hw_ctx; +struct elevator_tags { + /* num. of hardware queues for which tags are allocated */ + unsigned int nr_hw_queues; + /* depth used while allocating tags */ + unsigned int nr_requests; + /* shared tag is stored at index 0 */ + struct blk_mq_tags *tags[]; +}; + struct elevator_mq_ops { int (*init_sched)(struct request_queue *, struct elevator_queue *); void (*exit_sched)(struct elevator_queue *); @@ -113,6 +122,7 @@ struct request *elv_rqhash_find(struct request_queue *q, sector_t offset); struct elevator_queue { struct elevator_type *type; + struct elevator_tags *et; void *elevator_data; struct kobject kobj; struct mutex sysfs_lock; @@ -152,8 +162,8 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *page); ssize_t elv_iosched_store(struct gendisk *disk, const char *page, size_t count); extern bool elv_bio_merge_ok(struct request *, struct bio *); -extern struct elevator_queue *elevator_alloc(struct request_queue *, - struct elevator_type *); +struct elevator_queue *elevator_alloc(struct request_queue *, + struct elevator_type *, struct elevator_tags *); /* * Helper functions. -- 2.50.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff @ 2025-06-30 6:15 ` Christoph Hellwig 2025-06-30 6:20 ` Hannes Reinecke 2025-07-01 3:19 ` Ming Lei 2 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-06-30 6:15 UTC (permalink / raw) To: Nilay Shroff; +Cc: linux-block, ming.lei, axboe, sth, lkp, gjoyce Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff 2025-06-30 6:15 ` Christoph Hellwig @ 2025-06-30 6:20 ` Hannes Reinecke 2025-06-30 6:48 ` Nilay Shroff 2025-07-01 3:19 ` Ming Lei 2 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2025-06-30 6:20 UTC (permalink / raw) To: Nilay Shroff, linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce On 6/30/25 07:21, Nilay Shroff wrote: > Recent lockdep reports [1] have revealed a potential deadlock caused by a > lock dependency between the percpu allocator lock and the elevator lock. > This issue can be avoided by ensuring that the allocation and release of > scheduler tags (sched_tags) are performed outside the elevator lock. > Furthermore, the queue does not need to be remain frozen during these > operations. > > To address this, move all sched_tags allocations and deallocations outside > of both the ->elevator_lock and the ->freeze_lock. Since the lifetime of > the elevator queue and its associated sched_tags is closely tied, the > allocated sched_tags are now stored in the elevator queue structure. Then, > during the actual elevator switch (which runs under ->freeze_lock and > ->elevator_lock), the pre-allocated sched_tags are assigned to the > appropriate q->hctx. Once the elevator switch is complete and the locks > are released, the old elevator queue and its associated sched_tags are > freed. > > This commit specifically addresses the allocation/deallocation of sched_ > tags during elevator switching. Note that sched_tags may also be allocated > in other contexts, such as during nr_hw_queues updates. Supporting that > use case will require batch allocation/deallocation, which will be handled > in a follow-up patch. > > This restructuring ensures that sched_tags memory management occurs > entirely outside of the ->elevator_lock and ->freeze_lock context, > eliminating the lock dependency problem seen during scheduler updates. > > [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > > Reported-by: Stefan Haberland <sth@linux.ibm.com> > Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > block/blk-mq-sched.c | 155 +++++++++++++++++++++++-------------------- > block/blk-mq-sched.h | 8 ++- > block/elevator.c | 47 +++++++++++-- > block/elevator.h | 14 +++- > 4 files changed, 144 insertions(+), 80 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 359e0704e09b..2d6d1ebdd8fb 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -374,64 +374,17 @@ bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq, > } > EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge); > > -static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q, > - struct blk_mq_hw_ctx *hctx, > - unsigned int hctx_idx) > -{ > - if (blk_mq_is_shared_tags(q->tag_set->flags)) { > - hctx->sched_tags = q->sched_shared_tags; > - return 0; > - } > - > - hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx, > - q->nr_requests); > - > - if (!hctx->sched_tags) > - return -ENOMEM; > - return 0; > -} > - > -static void blk_mq_exit_sched_shared_tags(struct request_queue *queue) > -{ > - blk_mq_free_rq_map(queue->sched_shared_tags); > - queue->sched_shared_tags = NULL; > -} > - > /* called in queue's release handler, tagset has gone away */ > static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags) > { > struct blk_mq_hw_ctx *hctx; > unsigned long i; > > - queue_for_each_hw_ctx(q, hctx, i) { > - if (hctx->sched_tags) { > - if (!blk_mq_is_shared_tags(flags)) > - blk_mq_free_rq_map(hctx->sched_tags); > - hctx->sched_tags = NULL; > - } > - } > + queue_for_each_hw_ctx(q, hctx, i) > + hctx->sched_tags = NULL; > > if (blk_mq_is_shared_tags(flags)) > - blk_mq_exit_sched_shared_tags(q); > -} > - > -static int blk_mq_init_sched_shared_tags(struct request_queue *queue) > -{ > - struct blk_mq_tag_set *set = queue->tag_set; > - > - /* > - * Set initial depth at max so that we don't need to reallocate for > - * updating nr_requests. > - */ > - queue->sched_shared_tags = blk_mq_alloc_map_and_rqs(set, > - BLK_MQ_NO_HCTX_IDX, > - MAX_SCHED_RQ); > - if (!queue->sched_shared_tags) > - return -ENOMEM; > - > - blk_mq_tag_update_sched_shared_tags(queue); > - > - return 0; > + q->sched_shared_tags = NULL; > } > > void blk_mq_sched_reg_debugfs(struct request_queue *q) > @@ -458,8 +411,75 @@ void blk_mq_sched_unreg_debugfs(struct request_queue *q) > mutex_unlock(&q->debugfs_mutex); > } > > +void blk_mq_free_sched_tags(struct elevator_tags *et, > + struct blk_mq_tag_set *set) > +{ > + unsigned long i; > + > + /* Shared tags are stored at index 0 in @tags. */ > + if (blk_mq_is_shared_tags(set->flags)) > + blk_mq_free_map_and_rqs(set, et->tags[0], BLK_MQ_NO_HCTX_IDX); > + else { > + for (i = 0; i < et->nr_hw_queues; i++) > + blk_mq_free_map_and_rqs(set, et->tags[i], i); > + } > + > + kfree(et); > +} > + > +struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, > + unsigned int nr_hw_queues) > +{ > + unsigned int nr_tags; > + int i; > + struct elevator_tags *et; > + gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; > + > + if (blk_mq_is_shared_tags(set->flags)) > + nr_tags = 1; > + else > + nr_tags = nr_hw_queues; > + > + et = kmalloc(sizeof(struct elevator_tags) + > + nr_tags * sizeof(struct blk_mq_tags *), gfp); > + if (!et) > + return NULL; > + /* > + * Default to double of smaller one between hw queue_depth and > + * 128, since we don't split into sync/async like the old code > + * did. Additionally, this is a per-hw queue depth. > + */ > + et->nr_requests = 2 * min_t(unsigned int, set->queue_depth, > + BLKDEV_DEFAULT_RQ); > + et->nr_hw_queues = nr_hw_queues; > + > + if (blk_mq_is_shared_tags(set->flags)) { > + /* Shared tags are stored at index 0 in @tags. */ > + et->tags[0] = blk_mq_alloc_map_and_rqs(set, BLK_MQ_NO_HCTX_IDX, > + MAX_SCHED_RQ); > + if (!et->tags[0]) > + goto out; > + } else { > + for (i = 0; i < et->nr_hw_queues; i++) { > + et->tags[i] = blk_mq_alloc_map_and_rqs(set, i, > + et->nr_requests); > + if (!et->tags[i]) > + goto out_unwind; > + } > + } > + > + return et; > +out_unwind: > + while (--i >= 0) > + blk_mq_free_map_and_rqs(set, et->tags[i], i); > +out: > + kfree(et); > + return NULL; > +} > + As smatch stated, the unwind pattern is a bit odd. Maybe move the unwind into the 'else' branch, and us a conditional to invoke it: if (i < et->nr_hw_queues) while (--i >= 0) blk_mq_free_map_and_request() Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store 2025-06-30 6:20 ` Hannes Reinecke @ 2025-06-30 6:48 ` Nilay Shroff 0 siblings, 0 replies; 15+ messages in thread From: Nilay Shroff @ 2025-06-30 6:48 UTC (permalink / raw) To: Hannes Reinecke, linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce On 6/30/25 11:50 AM, Hannes Reinecke wrote: >> +struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, >> + unsigned int nr_hw_queues) >> +{ >> + unsigned int nr_tags; >> + int i; >> + struct elevator_tags *et; >> + gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; >> + >> + if (blk_mq_is_shared_tags(set->flags)) >> + nr_tags = 1; >> + else >> + nr_tags = nr_hw_queues; >> + >> + et = kmalloc(sizeof(struct elevator_tags) + >> + nr_tags * sizeof(struct blk_mq_tags *), gfp); >> + if (!et) >> + return NULL; >> + /* >> + * Default to double of smaller one between hw queue_depth and >> + * 128, since we don't split into sync/async like the old code >> + * did. Additionally, this is a per-hw queue depth. >> + */ >> + et->nr_requests = 2 * min_t(unsigned int, set->queue_depth, >> + BLKDEV_DEFAULT_RQ); >> + et->nr_hw_queues = nr_hw_queues; >> + >> + if (blk_mq_is_shared_tags(set->flags)) { >> + /* Shared tags are stored at index 0 in @tags. */ >> + et->tags[0] = blk_mq_alloc_map_and_rqs(set, BLK_MQ_NO_HCTX_IDX, >> + MAX_SCHED_RQ); >> + if (!et->tags[0]) >> + goto out; >> + } else { >> + for (i = 0; i < et->nr_hw_queues; i++) { >> + et->tags[i] = blk_mq_alloc_map_and_rqs(set, i, >> + et->nr_requests); >> + if (!et->tags[i]) >> + goto out_unwind; >> + } >> + } >> + >> + return et; >> +out_unwind: >> + while (--i >= 0) >> + blk_mq_free_map_and_rqs(set, et->tags[i], i); >> +out: >> + kfree(et); >> + return NULL; >> +} >> + > > As smatch stated, the unwind pattern is a bit odd. > Maybe move the unwind into the 'else' branch, and us a conditional > to invoke it: > > if (i < et->nr_hw_queues) > while (--i >= 0) > blk_mq_free_map_and_request() > I believe the 'if (i < et->nr_hw_queues)' check is unnecessary here. When we jump to the @out_unwind label, @i is always less than @et->nr_hw_queues because the for loop exits early (on allocation failure) before reaching the upper bound. If @i had reached @et->nr_hw_queues, the loop would have completed and we wouldn't jump to @out_unwind at all — we’d simply return @et instead. The Smatch flagged the unwind loop due to the use of an unsigned @i in the previous patch. In that case, if the first allocation (i == 0) fails, then '--i' underflows to UINT_MAX, and the condition '--i >= 0' is always true — hence the warning. This patch corrects that by declaring @i as a signed int, so that '--i >= 0' behaves as expected and avoids the Smatch warning. So, I don't think an extra condition like 'if (i < et->nr_hw_queues)' is needed around the unwind loop. Agreed? Thnaks, --Nilay ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff 2025-06-30 6:15 ` Christoph Hellwig 2025-06-30 6:20 ` Hannes Reinecke @ 2025-07-01 3:19 ` Ming Lei 2 siblings, 0 replies; 15+ messages in thread From: Ming Lei @ 2025-07-01 3:19 UTC (permalink / raw) To: Nilay Shroff; +Cc: linux-block, hch, axboe, sth, lkp, gjoyce On Mon, Jun 30, 2025 at 10:51:55AM +0530, Nilay Shroff wrote: > Recent lockdep reports [1] have revealed a potential deadlock caused by a > lock dependency between the percpu allocator lock and the elevator lock. > This issue can be avoided by ensuring that the allocation and release of > scheduler tags (sched_tags) are performed outside the elevator lock. > Furthermore, the queue does not need to be remain frozen during these > operations. > > To address this, move all sched_tags allocations and deallocations outside > of both the ->elevator_lock and the ->freeze_lock. Since the lifetime of > the elevator queue and its associated sched_tags is closely tied, the > allocated sched_tags are now stored in the elevator queue structure. Then, > during the actual elevator switch (which runs under ->freeze_lock and > ->elevator_lock), the pre-allocated sched_tags are assigned to the > appropriate q->hctx. Once the elevator switch is complete and the locks > are released, the old elevator queue and its associated sched_tags are > freed. > > This commit specifically addresses the allocation/deallocation of sched_ > tags during elevator switching. Note that sched_tags may also be allocated > in other contexts, such as during nr_hw_queues updates. Supporting that > use case will require batch allocation/deallocation, which will be handled > in a follow-up patch. > > This restructuring ensures that sched_tags memory management occurs > entirely outside of the ->elevator_lock and ->freeze_lock context, > eliminating the lock dependency problem seen during scheduler updates. > > [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > > Reported-by: Stefan Haberland <sth@linux.ibm.com> > Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-06-30 5:21 [PATCHv6 0/3] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff 2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff @ 2025-06-30 5:21 ` Nilay Shroff 2025-06-30 6:16 ` Christoph Hellwig ` (2 more replies) 2 siblings, 3 replies; 15+ messages in thread From: Nilay Shroff @ 2025-06-30 5:21 UTC (permalink / raw) To: linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce Move scheduler tags (sched_tags) allocation and deallocation outside both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues. This change breaks the dependency chain from the percpu allocator lock to the elevator lock, helping to prevent potential deadlocks, as observed in the reported lockdep splat[1]. This commit introduces batch allocation and deallocation helpers for sched_tags, which are now used from within __blk_mq_update_nr_hw_queues routine while iterating through the tagset. With this change, all sched_tags memory management is handled entirely outside the ->elevator_lock and the ->freeze_lock context, thereby eliminating the lock dependency that could otherwise manifest during nr_hw_queues updates. [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ Reported-by: Stefan Haberland <sth@linux.ibm.com> Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> --- block/blk-mq-sched.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ block/blk-mq-sched.h | 4 +++ block/blk-mq.c | 11 +++++++- block/blk.h | 2 +- block/elevator.c | 4 +-- 5 files changed, 80 insertions(+), 4 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 2d6d1ebdd8fb..da802df34a8c 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -427,6 +427,30 @@ void blk_mq_free_sched_tags(struct elevator_tags *et, kfree(et); } +void blk_mq_free_sched_tags_batch(struct xarray *et_table, + struct blk_mq_tag_set *set) +{ + struct request_queue *q; + struct elevator_tags *et; + + lockdep_assert_held_write(&set->update_nr_hwq_lock); + + list_for_each_entry(q, &set->tag_list, tag_set_list) { + /* + * Accessing q->elevator without holding q->elevator_lock is + * safe because we're holding here set->update_nr_hwq_lock in + * the writer context. So, scheduler update/switch code (which + * acquires the same lock but in the reader context) can't run + * concurrently. + */ + if (q->elevator) { + et = xa_load(et_table, q->id); + if (et) + blk_mq_free_sched_tags(et, set); + } + } +} + struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, unsigned int nr_hw_queues) { @@ -477,6 +501,45 @@ struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, return NULL; } +int blk_mq_alloc_sched_tags_batch(struct xarray *et_table, + struct blk_mq_tag_set *set, unsigned int nr_hw_queues) +{ + struct request_queue *q; + struct elevator_tags *et; + gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; + + lockdep_assert_held_write(&set->update_nr_hwq_lock); + + list_for_each_entry(q, &set->tag_list, tag_set_list) { + /* + * Accessing q->elevator without holding q->elevator_lock is + * safe because we're holding here set->update_nr_hwq_lock in + * the writer context. So, scheduler update/switch code (which + * acquires the same lock but in the reader context) can't run + * concurrently. + */ + if (q->elevator) { + et = blk_mq_alloc_sched_tags(set, nr_hw_queues); + if (!et) + goto out_unwind; + if (xa_insert(et_table, q->id, et, gfp)) + goto out_free_tags; + } + } + return 0; +out_free_tags: + blk_mq_free_sched_tags(et, set); +out_unwind: + list_for_each_entry_continue_reverse(q, &set->tag_list, tag_set_list) { + if (q->elevator) { + et = xa_load(et_table, q->id); + if (et) + blk_mq_free_sched_tags(et, set); + } + } + return -ENOMEM; +} + /* caller must have a reference to @e, will grab another one if successful */ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e, struct elevator_tags *et) diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 0cde00cd1c47..b554e1d55950 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -25,8 +25,12 @@ void blk_mq_sched_free_rqs(struct request_queue *q); struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, unsigned int nr_hw_queues); +int blk_mq_alloc_sched_tags_batch(struct xarray *et_table, + struct blk_mq_tag_set *set, unsigned int nr_hw_queues); void blk_mq_free_sched_tags(struct elevator_tags *et, struct blk_mq_tag_set *set); +void blk_mq_free_sched_tags_batch(struct xarray *et_table, + struct blk_mq_tag_set *set); static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { diff --git a/block/blk-mq.c b/block/blk-mq.c index 4806b867e37d..a68b658ce07b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4972,6 +4972,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, struct request_queue *q; int prev_nr_hw_queues = set->nr_hw_queues; unsigned int memflags; + struct xarray et_table; int i; lockdep_assert_held(&set->tag_list_lock); @@ -4984,6 +4985,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, return; memflags = memalloc_noio_save(); + + xa_init(&et_table); + if (blk_mq_alloc_sched_tags_batch(&et_table, set, nr_hw_queues) < 0) + goto out_memalloc_restore; + list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_debugfs_unregister_hctxs(q); blk_mq_sysfs_unregister_hctxs(q); @@ -4995,6 +5001,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, if (blk_mq_realloc_tag_set_tags(set, nr_hw_queues) < 0) { list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_unfreeze_queue_nomemrestore(q); + blk_mq_free_sched_tags_batch(&et_table, set); goto reregister; } @@ -5019,7 +5026,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, /* elv_update_nr_hw_queues() unfreeze queue for us */ list_for_each_entry(q, &set->tag_list, tag_set_list) - elv_update_nr_hw_queues(q); + elv_update_nr_hw_queues(q, &et_table); reregister: list_for_each_entry(q, &set->tag_list, tag_set_list) { @@ -5029,7 +5036,9 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, blk_mq_remove_hw_queues_cpuhp(q); blk_mq_add_hw_queues_cpuhp(q); } +out_memalloc_restore: memalloc_noio_restore(memflags); + xa_destroy(&et_table); /* Free the excess tags when nr_hw_queues shrink. */ for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++) diff --git a/block/blk.h b/block/blk.h index 37ec459fe656..c6d1d1458388 100644 --- a/block/blk.h +++ b/block/blk.h @@ -321,7 +321,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list, bool blk_insert_flush(struct request *rq); -void elv_update_nr_hw_queues(struct request_queue *q); +void elv_update_nr_hw_queues(struct request_queue *q, struct xarray *et_table); void elevator_set_default(struct request_queue *q); void elevator_set_none(struct request_queue *q); diff --git a/block/elevator.c b/block/elevator.c index 50f4b78efe66..8ba8b869d5a4 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -705,7 +705,7 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) * The I/O scheduler depends on the number of hardware queues, this forces a * reattachment when nr_hw_queues changes. */ -void elv_update_nr_hw_queues(struct request_queue *q) +void elv_update_nr_hw_queues(struct request_queue *q, struct xarray *et_table) { struct blk_mq_tag_set *set = q->tag_set; struct elv_change_ctx ctx = {}; @@ -720,7 +720,7 @@ void elv_update_nr_hw_queues(struct request_queue *q) * acquires same lock in the reader context) can't run concurrently. */ if (q->elevator) { - ctx.et = blk_mq_alloc_sched_tags(set, set->nr_hw_queues); + ctx.et = xa_load(et_table, q->id); if (!ctx.et) { WARN_ON_ONCE(1); return; -- 2.50.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff @ 2025-06-30 6:16 ` Christoph Hellwig 2025-06-30 6:24 ` Hannes Reinecke 2025-07-01 3:24 ` Ming Lei 2 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2025-06-30 6:16 UTC (permalink / raw) To: Nilay Shroff; +Cc: linux-block, hch, ming.lei, axboe, sth, lkp, gjoyce Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff 2025-06-30 6:16 ` Christoph Hellwig @ 2025-06-30 6:24 ` Hannes Reinecke 2025-06-30 6:57 ` Nilay Shroff 2025-07-01 3:24 ` Ming Lei 2 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2025-06-30 6:24 UTC (permalink / raw) To: Nilay Shroff, linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce On 6/30/25 07:21, Nilay Shroff wrote: > Move scheduler tags (sched_tags) allocation and deallocation outside > both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues. > This change breaks the dependency chain from the percpu allocator lock > to the elevator lock, helping to prevent potential deadlocks, as > observed in the reported lockdep splat[1]. > > This commit introduces batch allocation and deallocation helpers for > sched_tags, which are now used from within __blk_mq_update_nr_hw_queues > routine while iterating through the tagset. > > With this change, all sched_tags memory management is handled entirely > outside the ->elevator_lock and the ->freeze_lock context, thereby > eliminating the lock dependency that could otherwise manifest during > nr_hw_queues updates. > > [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > > Reported-by: Stefan Haberland <sth@linux.ibm.com> > Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > block/blk-mq-sched.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ > block/blk-mq-sched.h | 4 +++ > block/blk-mq.c | 11 +++++++- > block/blk.h | 2 +- > block/elevator.c | 4 +-- > 5 files changed, 80 insertions(+), 4 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 2d6d1ebdd8fb..da802df34a8c 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -427,6 +427,30 @@ void blk_mq_free_sched_tags(struct elevator_tags *et, > kfree(et); > } > > +void blk_mq_free_sched_tags_batch(struct xarray *et_table, > + struct blk_mq_tag_set *set) > +{ > + struct request_queue *q; > + struct elevator_tags *et; > + > + lockdep_assert_held_write(&set->update_nr_hwq_lock); > + > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + /* > + * Accessing q->elevator without holding q->elevator_lock is > + * safe because we're holding here set->update_nr_hwq_lock in > + * the writer context. So, scheduler update/switch code (which > + * acquires the same lock but in the reader context) can't run > + * concurrently. > + */ > + if (q->elevator) { > + et = xa_load(et_table, q->id); > + if (et) > + blk_mq_free_sched_tags(et, set); > + } > + } > +} > + Odd. Do we allow empty sched_tags? Why isn't this an error (or at least a warning)? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-06-30 6:24 ` Hannes Reinecke @ 2025-06-30 6:57 ` Nilay Shroff 0 siblings, 0 replies; 15+ messages in thread From: Nilay Shroff @ 2025-06-30 6:57 UTC (permalink / raw) To: Hannes Reinecke, linux-block; +Cc: hch, ming.lei, axboe, sth, lkp, gjoyce On 6/30/25 11:54 AM, Hannes Reinecke wrote: > On 6/30/25 07:21, Nilay Shroff wrote: >> Move scheduler tags (sched_tags) allocation and deallocation outside >> both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues. >> This change breaks the dependency chain from the percpu allocator lock >> to the elevator lock, helping to prevent potential deadlocks, as >> observed in the reported lockdep splat[1]. >> >> This commit introduces batch allocation and deallocation helpers for >> sched_tags, which are now used from within __blk_mq_update_nr_hw_queues >> routine while iterating through the tagset. >> >> With this change, all sched_tags memory management is handled entirely >> outside the ->elevator_lock and the ->freeze_lock context, thereby >> eliminating the lock dependency that could otherwise manifest during >> nr_hw_queues updates. >> >> [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ >> >> Reported-by: Stefan Haberland <sth@linux.ibm.com> >> Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ >> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> >> --- >> block/blk-mq-sched.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ >> block/blk-mq-sched.h | 4 +++ >> block/blk-mq.c | 11 +++++++- >> block/blk.h | 2 +- >> block/elevator.c | 4 +-- >> 5 files changed, 80 insertions(+), 4 deletions(-) >> >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >> index 2d6d1ebdd8fb..da802df34a8c 100644 >> --- a/block/blk-mq-sched.c >> +++ b/block/blk-mq-sched.c >> @@ -427,6 +427,30 @@ void blk_mq_free_sched_tags(struct elevator_tags *et, >> kfree(et); >> } >> +void blk_mq_free_sched_tags_batch(struct xarray *et_table, >> + struct blk_mq_tag_set *set) >> +{ >> + struct request_queue *q; >> + struct elevator_tags *et; >> + >> + lockdep_assert_held_write(&set->update_nr_hwq_lock); >> + >> + list_for_each_entry(q, &set->tag_list, tag_set_list) { >> + /* >> + * Accessing q->elevator without holding q->elevator_lock is >> + * safe because we're holding here set->update_nr_hwq_lock in >> + * the writer context. So, scheduler update/switch code (which >> + * acquires the same lock but in the reader context) can't run >> + * concurrently. >> + */ >> + if (q->elevator) { >> + et = xa_load(et_table, q->id); >> + if (et) >> + blk_mq_free_sched_tags(et, set); >> + } >> + } >> +} >> + > Odd. Do we allow empty sched_tags? > Why isn't this an error (or at least a warning)? > We don't allow empty sched_tags and xa_load is not supposed to return an empty sched_tags. It's just a paranoia guard that we evaluate whether @et is non-NULL. As this function returns nothing we don't return any error code to the caller here. But yes, we may add a warning here if that seems reasonable. I'd add a 'WARN_ON(!et)' and update it in the next patch. Thanks, --Nilay ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff 2025-06-30 6:16 ` Christoph Hellwig 2025-06-30 6:24 ` Hannes Reinecke @ 2025-07-01 3:24 ` Ming Lei 2025-07-01 5:20 ` Nilay Shroff 2 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2025-07-01 3:24 UTC (permalink / raw) To: Nilay Shroff; +Cc: linux-block, hch, axboe, sth, lkp, gjoyce On Mon, Jun 30, 2025 at 10:51:56AM +0530, Nilay Shroff wrote: > Move scheduler tags (sched_tags) allocation and deallocation outside > both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues. > This change breaks the dependency chain from the percpu allocator lock > to the elevator lock, helping to prevent potential deadlocks, as > observed in the reported lockdep splat[1]. > > This commit introduces batch allocation and deallocation helpers for > sched_tags, which are now used from within __blk_mq_update_nr_hw_queues > routine while iterating through the tagset. > > With this change, all sched_tags memory management is handled entirely > outside the ->elevator_lock and the ->freeze_lock context, thereby > eliminating the lock dependency that could otherwise manifest during > nr_hw_queues updates. > > [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > > Reported-by: Stefan Haberland <sth@linux.ibm.com> > Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/ > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> > --- > block/blk-mq-sched.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ > block/blk-mq-sched.h | 4 +++ > block/blk-mq.c | 11 +++++++- > block/blk.h | 2 +- > block/elevator.c | 4 +-- > 5 files changed, 80 insertions(+), 4 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 2d6d1ebdd8fb..da802df34a8c 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -427,6 +427,30 @@ void blk_mq_free_sched_tags(struct elevator_tags *et, > kfree(et); > } > > +void blk_mq_free_sched_tags_batch(struct xarray *et_table, > + struct blk_mq_tag_set *set) > +{ > + struct request_queue *q; > + struct elevator_tags *et; > + > + lockdep_assert_held_write(&set->update_nr_hwq_lock); > + > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + /* > + * Accessing q->elevator without holding q->elevator_lock is > + * safe because we're holding here set->update_nr_hwq_lock in > + * the writer context. So, scheduler update/switch code (which > + * acquires the same lock but in the reader context) can't run > + * concurrently. > + */ > + if (q->elevator) { > + et = xa_load(et_table, q->id); > + if (et) > + blk_mq_free_sched_tags(et, set); > + } > + } > +} > + > struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, > unsigned int nr_hw_queues) > { > @@ -477,6 +501,45 @@ struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, > return NULL; > } > > +int blk_mq_alloc_sched_tags_batch(struct xarray *et_table, > + struct blk_mq_tag_set *set, unsigned int nr_hw_queues) > +{ > + struct request_queue *q; > + struct elevator_tags *et; > + gfp_t gfp = GFP_NOIO | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; > + > + lockdep_assert_held_write(&set->update_nr_hwq_lock); > + > + list_for_each_entry(q, &set->tag_list, tag_set_list) { > + /* > + * Accessing q->elevator without holding q->elevator_lock is > + * safe because we're holding here set->update_nr_hwq_lock in > + * the writer context. So, scheduler update/switch code (which > + * acquires the same lock but in the reader context) can't run > + * concurrently. > + */ > + if (q->elevator) { > + et = blk_mq_alloc_sched_tags(set, nr_hw_queues); > + if (!et) > + goto out_unwind; > + if (xa_insert(et_table, q->id, et, gfp)) > + goto out_free_tags; > + } > + } > + return 0; > +out_free_tags: > + blk_mq_free_sched_tags(et, set); > +out_unwind: > + list_for_each_entry_continue_reverse(q, &set->tag_list, tag_set_list) { > + if (q->elevator) { > + et = xa_load(et_table, q->id); > + if (et) > + blk_mq_free_sched_tags(et, set); > + } > + } > + return -ENOMEM; > +} > + > /* caller must have a reference to @e, will grab another one if successful */ > int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e, > struct elevator_tags *et) > diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h > index 0cde00cd1c47..b554e1d55950 100644 > --- a/block/blk-mq-sched.h > +++ b/block/blk-mq-sched.h > @@ -25,8 +25,12 @@ void blk_mq_sched_free_rqs(struct request_queue *q); > > struct elevator_tags *blk_mq_alloc_sched_tags(struct blk_mq_tag_set *set, > unsigned int nr_hw_queues); > +int blk_mq_alloc_sched_tags_batch(struct xarray *et_table, > + struct blk_mq_tag_set *set, unsigned int nr_hw_queues); > void blk_mq_free_sched_tags(struct elevator_tags *et, > struct blk_mq_tag_set *set); > +void blk_mq_free_sched_tags_batch(struct xarray *et_table, > + struct blk_mq_tag_set *set); > > static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > { > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 4806b867e37d..a68b658ce07b 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -4972,6 +4972,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > struct request_queue *q; > int prev_nr_hw_queues = set->nr_hw_queues; > unsigned int memflags; > + struct xarray et_table; > int i; > > lockdep_assert_held(&set->tag_list_lock); > @@ -4984,6 +4985,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > return; > > memflags = memalloc_noio_save(); > + > + xa_init(&et_table); > + if (blk_mq_alloc_sched_tags_batch(&et_table, set, nr_hw_queues) < 0) > + goto out_memalloc_restore; > + > list_for_each_entry(q, &set->tag_list, tag_set_list) { > blk_mq_debugfs_unregister_hctxs(q); > blk_mq_sysfs_unregister_hctxs(q); > @@ -4995,6 +5001,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > if (blk_mq_realloc_tag_set_tags(set, nr_hw_queues) < 0) { > list_for_each_entry(q, &set->tag_list, tag_set_list) > blk_mq_unfreeze_queue_nomemrestore(q); > + blk_mq_free_sched_tags_batch(&et_table, set); > goto reregister; > } > > @@ -5019,7 +5026,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > /* elv_update_nr_hw_queues() unfreeze queue for us */ > list_for_each_entry(q, &set->tag_list, tag_set_list) > - elv_update_nr_hw_queues(q); > + elv_update_nr_hw_queues(q, &et_table); > > reregister: > list_for_each_entry(q, &set->tag_list, tag_set_list) { > @@ -5029,7 +5036,9 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > blk_mq_remove_hw_queues_cpuhp(q); > blk_mq_add_hw_queues_cpuhp(q); > } > +out_memalloc_restore: > memalloc_noio_restore(memflags); > + xa_destroy(&et_table); > > /* Free the excess tags when nr_hw_queues shrink. */ > for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++) > diff --git a/block/blk.h b/block/blk.h > index 37ec459fe656..c6d1d1458388 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -321,7 +321,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list, > > bool blk_insert_flush(struct request *rq); > > -void elv_update_nr_hw_queues(struct request_queue *q); > +void elv_update_nr_hw_queues(struct request_queue *q, struct xarray *et_table); > void elevator_set_default(struct request_queue *q); > void elevator_set_none(struct request_queue *q); > > diff --git a/block/elevator.c b/block/elevator.c > index 50f4b78efe66..8ba8b869d5a4 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -705,7 +705,7 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) > * The I/O scheduler depends on the number of hardware queues, this forces a > * reattachment when nr_hw_queues changes. > */ > -void elv_update_nr_hw_queues(struct request_queue *q) > +void elv_update_nr_hw_queues(struct request_queue *q, struct xarray *et_table) et_table isn't necessary to expose to elv_update_nr_hw_queues(), and it is less readable than passing 'struct elevator_tags *' directly, but it can be one followup improvement. Anyway: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update 2025-07-01 3:24 ` Ming Lei @ 2025-07-01 5:20 ` Nilay Shroff 0 siblings, 0 replies; 15+ messages in thread From: Nilay Shroff @ 2025-07-01 5:20 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, hch, axboe, sth, lkp, gjoyce On 7/1/25 8:54 AM, Ming Lei wrote: >> diff --git a/block/elevator.c b/block/elevator.c >> index 50f4b78efe66..8ba8b869d5a4 100644 >> --- a/block/elevator.c >> +++ b/block/elevator.c >> @@ -705,7 +705,7 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx) >> * The I/O scheduler depends on the number of hardware queues, this forces a >> * reattachment when nr_hw_queues changes. >> */ >> -void elv_update_nr_hw_queues(struct request_queue *q) >> +void elv_update_nr_hw_queues(struct request_queue *q, struct xarray *et_table) > > et_table isn't necessary to expose to elv_update_nr_hw_queues(), and it is > less readable than passing 'struct elevator_tags *' directly, but it can be > one followup improvement. > Yeah, makes sense... I will update this in the subsequent patchset. Thanks, --Nilay ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-01 5:20 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-30 5:21 [PATCHv6 0/3] block: move sched_tags allocation/de-allocation outside of locking context Nilay Shroff 2025-06-30 5:21 ` [PATCHv6 1/3] block: move elevator queue allocation logic into blk_mq_init_sched Nilay Shroff 2025-06-30 6:13 ` Christoph Hellwig 2025-06-30 6:17 ` Hannes Reinecke 2025-06-30 5:21 ` [PATCHv6 2/3] block: fix lockdep warning caused by lock dependency in elv_iosched_store Nilay Shroff 2025-06-30 6:15 ` Christoph Hellwig 2025-06-30 6:20 ` Hannes Reinecke 2025-06-30 6:48 ` Nilay Shroff 2025-07-01 3:19 ` Ming Lei 2025-06-30 5:21 ` [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update Nilay Shroff 2025-06-30 6:16 ` Christoph Hellwig 2025-06-30 6:24 ` Hannes Reinecke 2025-06-30 6:57 ` Nilay Shroff 2025-07-01 3:24 ` Ming Lei 2025-07-01 5:20 ` Nilay Shroff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox