public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning
@ 2025-05-05 14:17 Ming Lei
  2025-05-05 14:17 ` [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
                   ` (25 more replies)
  0 siblings, 26 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei

Hello Jens,

This patchset cleans up elevator change code, and unifying it via single
helper, meantime moves kobject_add/del & debugfs register/unregister out of
queue freezing & elevator_lock. This way fixes many lockdep warnings
reported recently, especially since fs_reclaim is connected with freeze lock
manually by commit ffa1e7ada456 ("block: Make request_queue lockdep splats
show up earlier").


Thanks,
Ming

V5:
	- replace down_read_nested() with down_read(), by adding
	  helper of add_disk_final()

	- fix race between elv_iosched_store and __del_gendisk (Nilay Shoff)

	- improve elv_update_nr_hw_queues() (Christoph)

V4:
	- pull Christoph's two elevator change/switch cleanup patches first,
	then the following elevator change unifying is simplified a lot(Christoph)

	- fold Nilay's patch into the last patch for avoiding one new
	lock warning on rqos_state_mutex (Nilay Shoff)

	- drop patch "block: move blk_unregister_queue() & device_del() after freeze
	wait" which may cause MD hang during shutdown, and add new patch "block: add
    new helper for disabling elevator switch when deleting disk" for dealing with
    one small race window (Christoph)

	- add patch "block: remove elevator queue's type check in elv_attr_show/store()"
	(Christoph)

    - rename ->update_nr_hwq_sema as ->update_nr_hwq_lock (Christoph)

	- add small patch "block: move blk_queue_registered() check into elv_iosched_store()"

V3:
	- replace srcu with rw_sem for avoiding race between add/del disk &
	  elevator switch and updating nr_hw_queues (Nilay Shoff)

	- add elv_update_nr_hw_queues() for elevator reattachment in case of
	updating nr_hw_queues, meantime keep elv_change_ctx as local structure
	(Christoph)

	- replace ->elevator_lock with disk->rqos_state_mutex for covering wbt
	state change

	- add new patch "block: use q->elevator with ->elevator_lock held in elv_iosched_show()"

	- small cleanup & commit log improvement

V2:
	- retry add/del disk when blk_mq_update_nr_hw_queues() is in-progress

	- swap blk_mq_add_queue_tag_set() with blk_mq_map_swqueue() in
	blk_mq_init_allocated_queue() (Nilay Shroff)

	- move ELEVATOR_FLAG_DISABLE_WBT to request queue's flags (Nilay Shoff) 

	- fix race because of delaying elevator unregister

	- define flags of `elv_change_ctx` as `bool` (Christoph)

	- improve comment and commit log (Christoph)



Christoph Hellwig (2):
  block: look up the elevator type in elevator_switch
  block: fold elevator_disable into elevator_switch

Ming Lei (23):
  block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue()
  block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag
  block: don't call freeze queue in elevator_switch() and
    elevator_disable()
  block: use q->elevator with ->elevator_lock held in elv_iosched_show()
  block: add two helpers for registering/un-registering sched debugfs
  block: move sched debugfs register into elvevator_register_queue
  block: add helper add_disk_final()
  block: prevent adding/deleting disk during updating nr_hw_queues
  block: don't allow to switch elevator if updating nr_hw_queues is
    in-progress
  block: move blk_queue_registered() check into elv_iosched_store()
  block: simplify elevator reattachment for updating nr_hw_queues
  block: move queue freezing & elevator_lock into elevator_change()
  block: add `struct elv_change_ctx` for unifying elevator change
  block: unifying elevator change
  block: pass elevator_queue to elv_register_queue & unregister_queue
  block: remove elevator queue's type check in elv_attr_show/store()
  block: fail to show/store elevator sysfs attribute if elevator is
    dying
  block: add new helper for disabling elevator switch when deleting disk
  block: move elv_register[unregister]_queue out of elevator_lock
  block: move hctx debugfs/sysfs registering out of freezing queue
  block: don't acquire ->elevator_lock in blk_mq_map_swqueue and
    blk_mq_realloc_hw_ctxs
  block: move hctx cpuhp add/del out of queue freezing
  block: move wbt_enable_default() out of queue freezing from sched
    ->exit()

 block/bfq-iosched.c    |   6 +-
 block/blk-mq-debugfs.c |  13 +-
 block/blk-mq-sched.c   |  41 +++--
 block/blk-mq.c         | 132 +++--------------
 block/blk-sysfs.c      |  29 ++--
 block/blk-wbt.c        |   9 +-
 block/blk.h            |   8 +-
 block/elevator.c       | 329 ++++++++++++++++++++++++-----------------
 block/elevator.h       |   6 +-
 block/genhd.c          | 197 +++++++++++++++---------
 include/linux/blk-mq.h |   3 +
 include/linux/blkdev.h |   8 +
 12 files changed, 402 insertions(+), 379 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue(), and publish
this request queue to tagset after everything is setup.

This way is safe because BLK_MQ_F_TAG_QUEUE_SHARED isn't used by
blk_mq_map_swqueue(), and this flag is mainly checked in fast IO code
path.

Prepare for removing ->elevator_lock from blk_mq_map_swqueue() which
is supposed to be called when elevator switch can't be done.

Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-block/567cb7ab-23d6-4cee-a915-c8cdac903ddd@linux.ibm.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 796baeccd37b..3292f2c4147a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4627,8 +4627,8 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->nr_requests = set->queue_depth;
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
+	blk_mq_add_queue_tag_set(set, q);
 	return 0;
 
 err_hctxs:
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
  2025-05-05 14:17 ` [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 03/25] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

ELEVATOR_FLAG_DISABLE_WBT is only used by BFQ to disallow wbt when BFQ is
in use. The flag is set in BFQ's init(), and cleared in BFQ's exit().

Making it as request queue flag, so that we can avoid to deal with elevator
switch race. Also it isn't graceful to checking one scheduler flag in
wbt_enable_default().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c    | 4 ++--
 block/blk-mq-debugfs.c | 1 +
 block/blk-wbt.c        | 3 +--
 block/elevator.h       | 1 -
 include/linux/blkdev.h | 3 +++
 5 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index abd80dc13562..cc6f59836dcd 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7210,7 +7210,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
 #endif
 
 	blk_stat_disable_accounting(bfqd->queue);
-	clear_bit(ELEVATOR_FLAG_DISABLE_WBT, &e->flags);
+	blk_queue_flag_clear(QUEUE_FLAG_DISABLE_WBT_DEF, bfqd->queue);
 	wbt_enable_default(bfqd->queue->disk);
 
 	kfree(bfqd);
@@ -7397,7 +7397,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	/* We dispatch from request queue wide instead of hw queue */
 	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
 
-	set_bit(ELEVATOR_FLAG_DISABLE_WBT, &eq->flags);
+	blk_queue_flag_set(QUEUE_FLAG_DISABLE_WBT_DEF, q);
 	wbt_disable_default(q->disk);
 	blk_stat_enable_accounting(q);
 
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3421b5521fe2..7710c409e432 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -93,6 +93,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
 	QUEUE_FLAG_NAME(SQ_SCHED),
+	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index f1754d07f7e0..29cd2e33666f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -704,8 +704,7 @@ void wbt_enable_default(struct gendisk *disk)
 	struct rq_qos *rqos;
 	bool enable = IS_ENABLED(CONFIG_BLK_WBT_MQ);
 
-	if (q->elevator &&
-	    test_bit(ELEVATOR_FLAG_DISABLE_WBT, &q->elevator->flags))
+	if (blk_queue_disable_wbt(q))
 		enable = false;
 
 	/* Throttling already enabled? */
diff --git a/block/elevator.h b/block/elevator.h
index e4e44dfac503..e27af5492cdb 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -121,7 +121,6 @@ struct elevator_queue
 };
 
 #define ELEVATOR_FLAG_REGISTERED	0
-#define ELEVATOR_FLAG_DISABLE_WBT	1
 
 /*
  * block elevator interface
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f3d74f9dae8e..9c373cf0eb47 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -644,6 +644,7 @@ enum {
 	QUEUE_FLAG_RQ_ALLOC_TIME,	/* record rq->alloc_time_ns */
 	QUEUE_FLAG_HCTX_ACTIVE,		/* at least one blk-mq hctx is active */
 	QUEUE_FLAG_SQ_SCHED,		/* single queue style io dispatch */
+	QUEUE_FLAG_DISABLE_WBT_DEF,	/* for sched to disable/enable wbt */
 	QUEUE_FLAG_MAX
 };
 
@@ -679,6 +680,8 @@ void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
 #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
 #define blk_queue_skip_tagset_quiesce(q) \
 	((q)->limits.features & BLK_FEAT_SKIP_TAGSET_QUIESCE)
+#define blk_queue_disable_wbt(q)	\
+	test_bit(QUEUE_FLAG_DISABLE_WBT_DEF, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 03/25] block: don't call freeze queue in elevator_switch() and elevator_disable()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
  2025-05-05 14:17 ` [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
  2025-05-05 14:17 ` [PATCH V5 02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Both elevator_switch() and elevator_disable() are only called from the
two code paths, in which queue is guaranteed to be frozen.

So don't call freeze queue in the two functions, also add asserts for
queue freeze.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index b4d08026b02c..5051a98dc08c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -615,12 +615,11 @@ void elevator_init_mq(struct request_queue *q)
  */
 int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 {
-	unsigned int memflags;
 	int ret;
 
+	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 	lockdep_assert_held(&q->elevator_lock);
 
-	memflags = blk_mq_freeze_queue(q);
 	blk_mq_quiesce_queue(q);
 
 	if (q->elevator) {
@@ -641,7 +640,6 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 
 out_unfreeze:
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q, memflags);
 
 	if (ret) {
 		pr_warn("elv: switch to \"%s\" failed, falling back to \"none\"\n",
@@ -653,11 +651,9 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 
 void elevator_disable(struct request_queue *q)
 {
-	unsigned int memflags;
-
+	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 	lockdep_assert_held(&q->elevator_lock);
 
-	memflags = blk_mq_freeze_queue(q);
 	blk_mq_quiesce_queue(q);
 
 	elv_unregister_queue(q);
@@ -668,7 +664,6 @@ void elevator_disable(struct request_queue *q)
 	blk_add_trace_msg(q, "elv switch: none");
 
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q, memflags);
 }
 
 /*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (2 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 03/25] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 05/25] block: add two helpers for registering/un-registering sched debugfs Ming Lei
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Use q->elevator with ->elevator_lock held in elv_iosched_show(), since
the local cached elevator reference may become stale after getting
->elevator_lock.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 5051a98dc08c..b32815594892 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -739,7 +739,6 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 ssize_t elv_iosched_show(struct gendisk *disk, char *name)
 {
 	struct request_queue *q = disk->queue;
-	struct elevator_queue *eq = q->elevator;
 	struct elevator_type *cur = NULL, *e;
 	int len = 0;
 
@@ -748,7 +747,7 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name)
 		len += sprintf(name+len, "[none] ");
 	} else {
 		len += sprintf(name+len, "none ");
-		cur = eq->type;
+		cur = q->elevator->type;
 	}
 
 	spin_lock(&elv_list_lock);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 05/25] block: add two helpers for registering/un-registering sched debugfs
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (3 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 06/25] block: move sched debugfs register into elvevator_register_queue Ming Lei
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Yu Kuai, Hannes Reinecke

Add blk_mq_sched_reg_debugfs()/blk_mq_sched_unreg_debugfs() to clean up
sched init/exit code a bit.

Register & unregister debugfs for sched & sched_hctx order is changed a
bit, but it is safe because sched & sched_hctx is guaranteed to be ready
when exporting via debugfs.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 45 +++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 9b81771774ef..2abc5e0704e8 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -434,6 +434,30 @@ static int blk_mq_init_sched_shared_tags(struct request_queue *queue)
 	return 0;
 }
 
+static void blk_mq_sched_reg_debugfs(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	mutex_lock(&q->debugfs_mutex);
+	blk_mq_debugfs_register_sched(q);
+	queue_for_each_hw_ctx(q, hctx, i)
+		blk_mq_debugfs_register_sched_hctx(q, hctx);
+	mutex_unlock(&q->debugfs_mutex);
+}
+
+static void blk_mq_sched_unreg_debugfs(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	mutex_lock(&q->debugfs_mutex);
+	queue_for_each_hw_ctx(q, hctx, i)
+		blk_mq_debugfs_unregister_sched_hctx(hctx);
+	blk_mq_debugfs_unregister_sched(q);
+	mutex_unlock(&q->debugfs_mutex);
+}
+
 /* caller must have a reference to @e, will grab another one if successful */
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 {
@@ -467,10 +491,6 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	if (ret)
 		goto err_free_map_and_rqs;
 
-	mutex_lock(&q->debugfs_mutex);
-	blk_mq_debugfs_register_sched(q);
-	mutex_unlock(&q->debugfs_mutex);
-
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (e->ops.init_hctx) {
 			ret = e->ops.init_hctx(hctx, i);
@@ -482,11 +502,11 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 				return ret;
 			}
 		}
-		mutex_lock(&q->debugfs_mutex);
-		blk_mq_debugfs_register_sched_hctx(q, hctx);
-		mutex_unlock(&q->debugfs_mutex);
 	}
 
+	/* sched is initialized, it is ready to export it via debugfs */
+	blk_mq_sched_reg_debugfs(q);
+
 	return 0;
 
 err_free_map_and_rqs:
@@ -524,11 +544,10 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 	unsigned long i;
 	unsigned int flags = 0;
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		mutex_lock(&q->debugfs_mutex);
-		blk_mq_debugfs_unregister_sched_hctx(hctx);
-		mutex_unlock(&q->debugfs_mutex);
+	/* unexport via debugfs before exiting sched */
+	blk_mq_sched_unreg_debugfs(q);
 
+	queue_for_each_hw_ctx(q, hctx, i) {
 		if (e->type->ops.exit_hctx && hctx->sched_data) {
 			e->type->ops.exit_hctx(hctx, i);
 			hctx->sched_data = NULL;
@@ -536,10 +555,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 		flags = hctx->flags;
 	}
 
-	mutex_lock(&q->debugfs_mutex);
-	blk_mq_debugfs_unregister_sched(q);
-	mutex_unlock(&q->debugfs_mutex);
-
 	if (e->type->ops.exit_sched)
 		e->type->ops.exit_sched(e);
 	blk_mq_sched_tags_teardown(q, flags);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 06/25] block: move sched debugfs register into elvevator_register_queue
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (4 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 05/25] block: add two helpers for registering/un-registering sched debugfs Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Yu Kuai, Hannes Reinecke

sched debugfs shares same lifetime with scheduler's kobject, and same
lock(elevator lock), so move sched debugfs register/unregister into
elevator_register_queue() and elevator_unregister_queue().

Then we needn't blk_mq_debugfs_register() for us to register sched
debugfs any more.

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 11 -----------
 block/blk-mq-sched.c   | 11 ++---------
 block/elevator.c       |  8 ++++++++
 block/elevator.h       |  3 +++
 4 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7710c409e432..2837a8ce8054 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -625,20 +625,9 @@ void blk_mq_debugfs_register(struct request_queue *q)
 
 	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
 
-	/*
-	 * blk_mq_init_sched() attempted to do this already, but q->debugfs_dir
-	 * didn't exist yet (because we don't know what to name the directory
-	 * until the queue is registered to a gendisk).
-	 */
-	if (q->elevator && !q->sched_debugfs_dir)
-		blk_mq_debugfs_register_sched(q);
-
-	/* Similarly, blk_mq_init_hctx() couldn't do this previously. */
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (!hctx->debugfs_dir)
 			blk_mq_debugfs_register_hctx(q, hctx);
-		if (q->elevator && !hctx->sched_debugfs_dir)
-			blk_mq_debugfs_register_sched_hctx(q, hctx);
 	}
 
 	if (q->rq_qos) {
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 2abc5e0704e8..336a15ffecfa 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -434,7 +434,7 @@ static int blk_mq_init_sched_shared_tags(struct request_queue *queue)
 	return 0;
 }
 
-static void blk_mq_sched_reg_debugfs(struct request_queue *q)
+void blk_mq_sched_reg_debugfs(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
@@ -446,7 +446,7 @@ static void blk_mq_sched_reg_debugfs(struct request_queue *q)
 	mutex_unlock(&q->debugfs_mutex);
 }
 
-static void blk_mq_sched_unreg_debugfs(struct request_queue *q)
+void blk_mq_sched_unreg_debugfs(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
@@ -503,10 +503,6 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 			}
 		}
 	}
-
-	/* sched is initialized, it is ready to export it via debugfs */
-	blk_mq_sched_reg_debugfs(q);
-
 	return 0;
 
 err_free_map_and_rqs:
@@ -544,9 +540,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 	unsigned long i;
 	unsigned int flags = 0;
 
-	/* unexport via debugfs before exiting sched */
-	blk_mq_sched_unreg_debugfs(q);
-
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (e->type->ops.exit_hctx && hctx->sched_data) {
 			e->type->ops.exit_hctx(hctx, i);
diff --git a/block/elevator.c b/block/elevator.c
index b32815594892..4400eb8fe54f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -472,6 +472,11 @@ int elv_register_queue(struct request_queue *q, bool uevent)
 		if (uevent)
 			kobject_uevent(&e->kobj, KOBJ_ADD);
 
+		/*
+		 * Sched is initialized, it is ready to export it via
+		 * debugfs
+		 */
+		blk_mq_sched_reg_debugfs(q);
 		set_bit(ELEVATOR_FLAG_REGISTERED, &e->flags);
 	}
 	return error;
@@ -486,6 +491,9 @@ void elv_unregister_queue(struct request_queue *q)
 	if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) {
 		kobject_uevent(&e->kobj, KOBJ_REMOVE);
 		kobject_del(&e->kobj);
+
+		/* unexport via debugfs before exiting sched */
+		blk_mq_sched_unreg_debugfs(q);
 	}
 }
 
diff --git a/block/elevator.h b/block/elevator.h
index e27af5492cdb..9198676644a9 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -181,4 +181,7 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
 #define rq_entry_fifo(ptr)	list_entry((ptr), struct request, queuelist)
 #define rq_fifo_clear(rq)	list_del_init(&(rq)->queuelist)
 
+void blk_mq_sched_reg_debugfs(struct request_queue *q);
+void blk_mq_sched_unreg_debugfs(struct request_queue *q);
+
 #endif /* _ELEVATOR_H */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 07/25] block: add helper add_disk_final()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (5 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 06/25] block: move sched debugfs register into elvevator_register_queue Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  4:40   ` Christoph Hellwig
                     ` (2 more replies)
  2025-05-05 14:17 ` [PATCH V5 08/25] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
                   ` (18 subsequent siblings)
  25 siblings, 3 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei

Add helper add_disk_final() for scanning partitions, announcing disk and
handling the last thing for adding disk.

No functional change, and prepare for prevent adding disk from happening
when updating nr_hw_queues.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/genhd.c | 46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index c2bd86cd09de..50f3deeec5e3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -389,6 +389,32 @@ int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
 	return ret;
 }
 
+static void add_disk_final(struct gendisk *disk)
+{
+	struct device *ddev = disk_to_dev(disk);
+
+	if (!(disk->flags & GENHD_FL_HIDDEN)) {
+		/* Make sure the first partition scan will be proceed */
+		if (get_capacity(disk) && disk_has_partscan(disk))
+			set_bit(GD_NEED_PART_SCAN, &disk->state);
+
+		bdev_add(disk->part0, ddev->devt);
+		if (get_capacity(disk))
+			disk_scan_partitions(disk, BLK_OPEN_READ);
+
+		/*
+		 * Announce the disk and partitions after all partitions are
+		 * created. (for hidden disks uevents remain suppressed forever)
+		 */
+		dev_set_uevent_suppress(ddev, 0);
+		disk_uevent(disk, KOBJ_ADD);
+	}
+
+	blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
+	disk_add_events(disk);
+	set_bit(GD_ADDED, &disk->state);
+}
+
 /**
  * add_disk_fwnode - add disk information to kernel list with fwnode
  * @parent: parent device for the disk
@@ -516,21 +542,6 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
 					&disk->bdi->dev->kobj, "bdi");
 		if (ret)
 			goto out_unregister_bdi;
-
-		/* Make sure the first partition scan will be proceed */
-		if (get_capacity(disk) && disk_has_partscan(disk))
-			set_bit(GD_NEED_PART_SCAN, &disk->state);
-
-		bdev_add(disk->part0, ddev->devt);
-		if (get_capacity(disk))
-			disk_scan_partitions(disk, BLK_OPEN_READ);
-
-		/*
-		 * Announce the disk and partitions after all partitions are
-		 * created. (for hidden disks uevents remain suppressed forever)
-		 */
-		dev_set_uevent_suppress(ddev, 0);
-		disk_uevent(disk, KOBJ_ADD);
 	} else {
 		/*
 		 * Even if the block_device for a hidden gendisk is not
@@ -539,10 +550,7 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
 		 */
 		disk->part0->bd_dev = MKDEV(disk->major, disk->first_minor);
 	}
-
-	blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
-	disk_add_events(disk);
-	set_bit(GD_ADDED, &disk->state);
+	add_disk_final(disk);
 	return 0;
 
 out_unregister_bdi:
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 08/25] block: prevent adding/deleting disk during updating nr_hw_queues
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (6 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Both adding/deleting disk code are reader of `nr_hw_queues`, so we can't
allow them in-progress when updating nr_hw_queues, kernel panic and
kasan has been reported in [1].

Prevent adding/deleting disk during updating nr_hw_queues by adding
rw_semaphore to tagset, write lock is grabbed in blk_mq_update_nr_hw_queues(),
and read lock is acquired when adding/deleting disk.

Also mark GFP_NOIO allocation scope for adding/deleting disk because
blk_mq_update_nr_hw_queues() is part of some driver's error handler.

This way avoids lot of trouble.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Closes: https://lore.kernel.org/linux-block/a5896cdb-a59a-4a37-9f99-20522f5d2987@linux.ibm.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         |   4 ++
 block/genhd.c          | 113 ++++++++++++++++++++++++++++-------------
 include/linux/blk-mq.h |   3 ++
 3 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3292f2c4147a..240afa4c6ab3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4848,6 +4848,8 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 			goto out_free_srcu;
 	}
 
+	init_rwsem(&set->update_nr_hwq_lock);
+
 	ret = -ENOMEM;
 	set->tags = kcalloc_node(set->nr_hw_queues,
 				 sizeof(struct blk_mq_tags *), GFP_KERNEL,
@@ -5143,9 +5145,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
 {
+	down_write(&set->update_nr_hwq_lock);
 	mutex_lock(&set->tag_list_lock);
 	__blk_mq_update_nr_hw_queues(set, nr_hw_queues);
 	mutex_unlock(&set->tag_list_lock);
+	up_write(&set->update_nr_hwq_lock);
 }
 EXPORT_SYMBOL_GPL(blk_mq_update_nr_hw_queues);
 
diff --git a/block/genhd.c b/block/genhd.c
index 50f3deeec5e3..e0dd8ecc925f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -415,19 +415,9 @@ static void add_disk_final(struct gendisk *disk)
 	set_bit(GD_ADDED, &disk->state);
 }
 
-/**
- * add_disk_fwnode - add disk information to kernel list with fwnode
- * @parent: parent device for the disk
- * @disk: per-device partitioning information
- * @groups: Additional per-device sysfs groups
- * @fwnode: attached disk fwnode
- *
- * This function registers the partitioning information in @disk
- * with the kernel. Also attach a fwnode to the disk device.
- */
-int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
-				 const struct attribute_group **groups,
-				 struct fwnode_handle *fwnode)
+static int __add_disk(struct device *parent, struct gendisk *disk,
+		      const struct attribute_group **groups,
+		      struct fwnode_handle *fwnode)
 
 {
 	struct device *ddev = disk_to_dev(disk);
@@ -550,7 +540,6 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
 		 */
 		disk->part0->bd_dev = MKDEV(disk->major, disk->first_minor);
 	}
-	add_disk_final(disk);
 	return 0;
 
 out_unregister_bdi:
@@ -580,6 +569,45 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
 	}
 	return ret;
 }
+
+/**
+ * add_disk_fwnode - add disk information to kernel list with fwnode
+ * @parent: parent device for the disk
+ * @disk: per-device partitioning information
+ * @groups: Additional per-device sysfs groups
+ * @fwnode: attached disk fwnode
+ *
+ * This function registers the partitioning information in @disk
+ * with the kernel. Also attach a fwnode to the disk device.
+ */
+int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
+				 const struct attribute_group **groups,
+				 struct fwnode_handle *fwnode)
+{
+	struct blk_mq_tag_set *set;
+	unsigned int memflags;
+	int ret;
+
+	if (queue_is_mq(disk->queue)) {
+		set = disk->queue->tag_set;
+		memflags = memalloc_noio_save();
+		down_read(&set->update_nr_hwq_lock);
+		ret = __add_disk(parent, disk, groups, fwnode);
+		up_read(&set->update_nr_hwq_lock);
+		memalloc_noio_restore(memflags);
+	} else {
+		ret = __add_disk(parent, disk, groups, fwnode);
+	}
+
+	/*
+	 * add_disk_final() needn't to read `nr_hw_queues`, so move it out
+	 * of read lock `set->update_nr_hwq_lock` for avoiding unnecessary
+	 * lock dependency on `disk->open_mutex` from scanning partition.
+	 */
+	if (!ret)
+		add_disk_final(disk);
+	return ret;
+}
 EXPORT_SYMBOL_GPL(add_disk_fwnode);
 
 /**
@@ -660,26 +688,7 @@ void blk_mark_disk_dead(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
-/**
- * del_gendisk - remove the gendisk
- * @disk: the struct gendisk to remove
- *
- * Removes the gendisk and all its associated resources. This deletes the
- * partitions associated with the gendisk, and unregisters the associated
- * request_queue.
- *
- * This is the counter to the respective __device_add_disk() call.
- *
- * The final removal of the struct gendisk happens when its refcount reaches 0
- * with put_disk(), which should be called after del_gendisk(), if
- * __device_add_disk() was used.
- *
- * Drivers exist which depend on the release of the gendisk to be synchronous,
- * it should not be deferred.
- *
- * Context: can sleep
- */
-void del_gendisk(struct gendisk *disk)
+static void __del_gendisk(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
 	struct block_device *part;
@@ -772,6 +781,42 @@ void del_gendisk(struct gendisk *disk)
 	if (start_drain)
 		blk_unfreeze_release_lock(q);
 }
+
+/**
+ * del_gendisk - remove the gendisk
+ * @disk: the struct gendisk to remove
+ *
+ * Removes the gendisk and all its associated resources. This deletes the
+ * partitions associated with the gendisk, and unregisters the associated
+ * request_queue.
+ *
+ * This is the counter to the respective __device_add_disk() call.
+ *
+ * The final removal of the struct gendisk happens when its refcount reaches 0
+ * with put_disk(), which should be called after del_gendisk(), if
+ * __device_add_disk() was used.
+ *
+ * Drivers exist which depend on the release of the gendisk to be synchronous,
+ * it should not be deferred.
+ *
+ * Context: can sleep
+ */
+void del_gendisk(struct gendisk *disk)
+{
+	struct blk_mq_tag_set *set;
+	unsigned int memflags;
+
+	if (!queue_is_mq(disk->queue)) {
+		__del_gendisk(disk);
+	} else {
+		set = disk->queue->tag_set;
+		memflags = memalloc_noio_save();
+		down_read(&set->update_nr_hwq_lock);
+		__del_gendisk(disk);
+		up_read(&set->update_nr_hwq_lock);
+		memalloc_noio_restore(memflags);
+	}
+}
 EXPORT_SYMBOL(del_gendisk);
 
 /**
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8eb9b3310167..ef84d53095a6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -9,6 +9,7 @@
 #include <linux/prefetch.h>
 #include <linux/srcu.h>
 #include <linux/rw_hint.h>
+#include <linux/rwsem.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -527,6 +528,8 @@ struct blk_mq_tag_set {
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
 	struct srcu_struct	*srcu;
+
+	struct rw_semaphore	update_nr_hwq_lock;
 };
 
 /**
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (7 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 08/25] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  4:41   ` Christoph Hellwig
  2025-05-06  6:26   ` Nilay Shroff
  2025-05-05 14:17 ` [PATCH V5 10/25] block: look up the elevator type in elevator_switch Ming Lei
                   ` (16 subsequent siblings)
  25 siblings, 2 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Elevator switch code is another `nr_hw_queue` reader in non-fast-IO code
path, so it can't be done if updating `nr_hw_queues` is in-progress.

Take same approach with not allowing add/del disk when updating
nr_hw_queues is in-progress, by grabbing read lock of
set->update_nr_hwq_sema.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/linux-block/aAWv3NPtNIKKvJZc@fedora/ [1]
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/mz4t4tlwiqjijw3zvqnjb7ovvvaegkqganegmmlc567tt5xj67@xal5ro544cnc/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/elevator.c b/block/elevator.c
index 4400eb8fe54f..2e18513dcd73 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -723,6 +723,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	int ret;
 	unsigned int memflags;
 	struct request_queue *q = disk->queue;
+	struct blk_mq_tag_set *set = q->tag_set;
 
 	/*
 	 * If the attribute needs to load a module, do it before freezing the
@@ -734,6 +735,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 
 	elv_iosched_load_module(name);
 
+	down_read(&set->update_nr_hwq_lock);
 	memflags = blk_mq_freeze_queue(q);
 	mutex_lock(&q->elevator_lock);
 	ret = elevator_change(q, name);
@@ -741,6 +743,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 		ret = count;
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
+	up_read(&set->update_nr_hwq_lock);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 10/25] block: look up the elevator type in elevator_switch
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (8 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 11/25] block: fold elevator_disable into elevator_switch Ming Lei
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Hannes Reinecke, Ming Lei

From: Christoph Hellwig <hch@lst.de>

That makes the function nicely self-contained and can be used
to avoid code duplication.

Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c   |  2 +-
 block/blk.h      |  2 +-
 block/elevator.c | 18 ++++++++----------
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 240afa4c6ab3..20bf2d797bdf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -5060,7 +5060,7 @@ static void blk_mq_elv_switch_back(struct list_head *head,
 	kfree(qe);
 
 	mutex_lock(&q->elevator_lock);
-	elevator_switch(q, t);
+	elevator_switch(q, t->elevator_name);
 	/* drop the reference acquired in blk_mq_elv_switch_none */
 	elevator_put(t);
 	mutex_unlock(&q->elevator_lock);
diff --git a/block/blk.h b/block/blk.h
index 328075787814..ac90a8c5a8cf 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -322,7 +322,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 
 bool blk_insert_flush(struct request *rq);
 
-int elevator_switch(struct request_queue *q, struct elevator_type *new_e);
+int elevator_switch(struct request_queue *q, const char *name);
 void elevator_disable(struct request_queue *q);
 void elevator_exit(struct request_queue *q);
 int elv_register_queue(struct request_queue *q, bool uevent);
diff --git a/block/elevator.c b/block/elevator.c
index 2e18513dcd73..286d240a3aef 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -621,13 +621,18 @@ void elevator_init_mq(struct request_queue *q)
  * If switching fails, we are most likely running out of memory and not able
  * to restore the old io scheduler, so leaving the io scheduler being none.
  */
-int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
+int elevator_switch(struct request_queue *q, const char *name)
 {
+	struct elevator_type *new_e;
 	int ret;
 
 	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 	lockdep_assert_held(&q->elevator_lock);
 
+	new_e = elevator_find_get(name);
+	if (!new_e)
+		return -EINVAL;
+
 	blk_mq_quiesce_queue(q);
 
 	if (q->elevator) {
@@ -654,6 +659,7 @@ int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 			new_e->elevator_name);
 	}
 
+	elevator_put(new_e);
 	return ret;
 }
 
@@ -679,9 +685,6 @@ void elevator_disable(struct request_queue *q)
  */
 static int elevator_change(struct request_queue *q, const char *elevator_name)
 {
-	struct elevator_type *e;
-	int ret;
-
 	/* Make sure queue is not in the middle of being removed */
 	if (!blk_queue_registered(q))
 		return -ENOENT;
@@ -695,12 +698,7 @@ static int elevator_change(struct request_queue *q, const char *elevator_name)
 	if (q->elevator && elevator_match(q->elevator->type, elevator_name))
 		return 0;
 
-	e = elevator_find_get(elevator_name);
-	if (!e)
-		return -EINVAL;
-	ret = elevator_switch(q, e);
-	elevator_put(e);
-	return ret;
+	return elevator_switch(q, elevator_name);
 }
 
 static void elv_iosched_load_module(char *elevator_name)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 11/25] block: fold elevator_disable into elevator_switch
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (9 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 10/25] block: look up the elevator type in elevator_switch Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Hannes Reinecke, Ming Lei

From: Christoph Hellwig <hch@lst.de>

This removes duplicate code, and keeps the callers tidy.

Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 61 ++++++++++++++++++------------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 286d240a3aef..766deaf34214 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -623,15 +623,17 @@ void elevator_init_mq(struct request_queue *q)
  */
 int elevator_switch(struct request_queue *q, const char *name)
 {
-	struct elevator_type *new_e;
-	int ret;
+	struct elevator_type *new_e = NULL;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 	lockdep_assert_held(&q->elevator_lock);
 
-	new_e = elevator_find_get(name);
-	if (!new_e)
-		return -EINVAL;
+	if (strncmp(name, "none", 4)) {
+		new_e = elevator_find_get(name);
+		if (!new_e)
+			return -EINVAL;
+	}
 
 	blk_mq_quiesce_queue(q);
 
@@ -640,16 +642,21 @@ int elevator_switch(struct request_queue *q, const char *name)
 		elevator_exit(q);
 	}
 
-	ret = blk_mq_init_sched(q, new_e);
-	if (ret)
-		goto out_unfreeze;
-
-	ret = elv_register_queue(q, true);
-	if (ret) {
-		elevator_exit(q);
-		goto out_unfreeze;
+	if (new_e) {
+		ret = blk_mq_init_sched(q, new_e);
+		if (ret)
+			goto out_unfreeze;
+		ret = elv_register_queue(q, true);
+		if (ret) {
+			elevator_exit(q);
+			goto out_unfreeze;
+		}
+	} else {
+		blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
+		q->elevator = NULL;
+		q->nr_requests = q->tag_set->queue_depth;
 	}
-	blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
+	blk_add_trace_msg(q, "elv switch: %s", name);
 
 out_unfreeze:
 	blk_mq_unquiesce_queue(q);
@@ -659,27 +666,11 @@ int elevator_switch(struct request_queue *q, const char *name)
 			new_e->elevator_name);
 	}
 
-	elevator_put(new_e);
+	if (new_e)
+		elevator_put(new_e);
 	return ret;
 }
 
-void elevator_disable(struct request_queue *q)
-{
-	WARN_ON_ONCE(q->mq_freeze_depth == 0);
-	lockdep_assert_held(&q->elevator_lock);
-
-	blk_mq_quiesce_queue(q);
-
-	elv_unregister_queue(q);
-	elevator_exit(q);
-	blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
-	q->elevator = NULL;
-	q->nr_requests = q->tag_set->queue_depth;
-	blk_add_trace_msg(q, "elv switch: none");
-
-	blk_mq_unquiesce_queue(q);
-}
-
 /*
  * Switch this queue to the given IO scheduler.
  */
@@ -689,12 +680,6 @@ static int elevator_change(struct request_queue *q, const char *elevator_name)
 	if (!blk_queue_registered(q))
 		return -ENOENT;
 
-	if (!strncmp(elevator_name, "none", 4)) {
-		if (q->elevator)
-			elevator_disable(q);
-		return 0;
-	}
-
 	if (q->elevator && elevator_match(q->elevator->type, elevator_name))
 		return 0;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (10 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 11/25] block: fold elevator_disable into elevator_switch Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  4:41   ` Christoph Hellwig
  2025-05-06  7:47   ` Nilay Shroff
  2025-05-05 14:17 ` [PATCH V5 13/25] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
                   ` (13 subsequent siblings)
  25 siblings, 2 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move blk_queue_registered() check into elv_iosched_store() and prepare
for using elevator_change() for covering any kind of elevator change in
adding/deleting disk and updating nr_hw_queue.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 766deaf34214..4e58379c4d88 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -676,10 +676,6 @@ int elevator_switch(struct request_queue *q, const char *name)
  */
 static int elevator_change(struct request_queue *q, const char *elevator_name)
 {
-	/* Make sure queue is not in the middle of being removed */
-	if (!blk_queue_registered(q))
-		return -ENOENT;
-
 	if (q->elevator && elevator_match(q->elevator->type, elevator_name))
 		return 0;
 
@@ -708,6 +704,10 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	struct request_queue *q = disk->queue;
 	struct blk_mq_tag_set *set = q->tag_set;
 
+	/* Make sure queue is not in the middle of being removed */
+	if (!blk_queue_registered(q))
+		return -ENOENT;
+
 	/*
 	 * If the attribute needs to load a module, do it before freezing the
 	 * queue to ensure that the module file can be read when the request
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 13/25] block: simplify elevator reattachment for updating nr_hw_queues
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (11 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 14/25] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

In blk_mq_update_nr_hw_queues(), nr_hw_queues changes and elevator data
depends on it, and elevator has to be reattached, so call elevator_switch()
to force attachment.

Add elv_update_nr_hw_queues() simply for blk_mq_update_nr_hw_queues() to
reattach elevator, since elevator switch isn't likely when running
blk_mq_update_nr_hw_queues(). This way removes the current switch
none and switch back code.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c   | 90 +-----------------------------------------------
 block/blk.h      |  3 +-
 block/elevator.c | 20 ++++++++++-
 3 files changed, 21 insertions(+), 92 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20bf2d797bdf..29f67b0e1fd5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4989,88 +4989,10 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	return ret;
 }
 
-/*
- * request_queue and elevator_type pair.
- * It is just used by __blk_mq_update_nr_hw_queues to cache
- * the elevator_type associated with a request_queue.
- */
-struct blk_mq_qe_pair {
-	struct list_head node;
-	struct request_queue *q;
-	struct elevator_type *type;
-};
-
-/*
- * Cache the elevator_type in qe pair list and switch the
- * io scheduler to 'none'
- */
-static bool blk_mq_elv_switch_none(struct list_head *head,
-		struct request_queue *q)
-{
-	struct blk_mq_qe_pair *qe;
-
-	qe = kmalloc(sizeof(*qe), GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY);
-	if (!qe)
-		return false;
-
-	/* Accessing q->elevator needs protection from ->elevator_lock. */
-	mutex_lock(&q->elevator_lock);
-
-	if (!q->elevator) {
-		kfree(qe);
-		goto unlock;
-	}
-
-	INIT_LIST_HEAD(&qe->node);
-	qe->q = q;
-	qe->type = q->elevator->type;
-	/* keep a reference to the elevator module as we'll switch back */
-	__elevator_get(qe->type);
-	list_add(&qe->node, head);
-	elevator_disable(q);
-unlock:
-	mutex_unlock(&q->elevator_lock);
-
-	return true;
-}
-
-static struct blk_mq_qe_pair *blk_lookup_qe_pair(struct list_head *head,
-						struct request_queue *q)
-{
-	struct blk_mq_qe_pair *qe;
-
-	list_for_each_entry(qe, head, node)
-		if (qe->q == q)
-			return qe;
-
-	return NULL;
-}
-
-static void blk_mq_elv_switch_back(struct list_head *head,
-				  struct request_queue *q)
-{
-	struct blk_mq_qe_pair *qe;
-	struct elevator_type *t;
-
-	qe = blk_lookup_qe_pair(head, q);
-	if (!qe)
-		return;
-	t = qe->type;
-	list_del(&qe->node);
-	kfree(qe);
-
-	mutex_lock(&q->elevator_lock);
-	elevator_switch(q, t->elevator_name);
-	/* drop the reference acquired in blk_mq_elv_switch_none */
-	elevator_put(t);
-	mutex_unlock(&q->elevator_lock);
-}
-
 static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 							int nr_hw_queues)
 {
 	struct request_queue *q;
-	LIST_HEAD(head);
 	int prev_nr_hw_queues = set->nr_hw_queues;
 	unsigned int memflags;
 	int i;
@@ -5088,15 +5010,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_freeze_queue_nomemsave(q);
 
-	/*
-	 * Switch IO scheduler to 'none', cleaning up the data associated
-	 * with the previous scheduler. We will switch back once we are done
-	 * updating the new sw to hw queue mappings.
-	 */
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		if (!blk_mq_elv_switch_none(&head, q))
-			goto switch_back;
-
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_debugfs_unregister_hctxs(q);
 		blk_mq_sysfs_unregister_hctxs(q);
@@ -5130,9 +5043,8 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_debugfs_register_hctxs(q);
 	}
 
-switch_back:
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_elv_switch_back(&head, q);
+		elv_update_nr_hw_queues(q);
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_unfreeze_queue_nomemrestore(q);
diff --git a/block/blk.h b/block/blk.h
index ac90a8c5a8cf..98fd3a6f5ec9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -322,8 +322,7 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 
 bool blk_insert_flush(struct request *rq);
 
-int elevator_switch(struct request_queue *q, const char *name);
-void elevator_disable(struct request_queue *q);
+void elv_update_nr_hw_queues(struct request_queue *q);
 void elevator_exit(struct request_queue *q);
 int elv_register_queue(struct request_queue *q, bool uevent);
 void elv_unregister_queue(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 4e58379c4d88..cabd7a8bb76c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -621,7 +621,7 @@ void elevator_init_mq(struct request_queue *q)
  * If switching fails, we are most likely running out of memory and not able
  * to restore the old io scheduler, so leaving the io scheduler being none.
  */
-int elevator_switch(struct request_queue *q, const char *name)
+static int elevator_switch(struct request_queue *q, const char *name)
 {
 	struct elevator_type *new_e = NULL;
 	int ret = 0;
@@ -682,6 +682,24 @@ static int elevator_change(struct request_queue *q, const char *elevator_name)
 	return elevator_switch(q, elevator_name);
 }
 
+/*
+ * The I/O scheduler depends on the number of hardware queues, this forces a
+ * reattachment when nr_hw_queues changes.
+ */
+void elv_update_nr_hw_queues(struct request_queue *q)
+{
+	WARN_ON_ONCE(q->mq_freeze_depth == 0);
+
+	mutex_lock(&q->elevator_lock);
+	if (q->elevator && !blk_queue_dying(q) && !blk_queue_registered(q)) {
+		const char *name = q->elevator->type->elevator_name;
+
+		/* force to reattach elevator after nr_hw_queue is updated */
+		elevator_switch(q, name);
+	}
+	mutex_unlock(&q->elevator_lock);
+}
+
 static void elv_iosched_load_module(char *elevator_name)
 {
 	struct elevator_type *found;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 14/25] block: move queue freezing & elevator_lock into elevator_change()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (12 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 13/25] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move queue freezing & elevator_lock into elevator_change(), and prepare
for using elevator_change() for setting up & tearing down default elevator
too.

Also add lockdep_assert_held() in __elevator_change() because either
read or write lock is required for changing elevator.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index cabd7a8bb76c..cb54a3791fe5 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -676,10 +676,19 @@ static int elevator_switch(struct request_queue *q, const char *name)
  */
 static int elevator_change(struct request_queue *q, const char *elevator_name)
 {
-	if (q->elevator && elevator_match(q->elevator->type, elevator_name))
-		return 0;
+	unsigned int memflags;
+	int ret = 0;
 
-	return elevator_switch(q, elevator_name);
+	lockdep_assert_held(&q->tag_set->update_nr_hwq_lock);
+
+	memflags = blk_mq_freeze_queue(q);
+	mutex_lock(&q->elevator_lock);
+	if (!(q->elevator && elevator_match(q->elevator->type,
+				elevator_name)))
+		ret = elevator_switch(q, elevator_name);
+	mutex_unlock(&q->elevator_lock);
+	blk_mq_unfreeze_queue(q, memflags);
+	return ret;
 }
 
 /*
@@ -718,7 +727,6 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	char elevator_name[ELV_NAME_MAX];
 	char *name;
 	int ret;
-	unsigned int memflags;
 	struct request_queue *q = disk->queue;
 	struct blk_mq_tag_set *set = q->tag_set;
 
@@ -737,13 +745,9 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	elv_iosched_load_module(name);
 
 	down_read(&set->update_nr_hwq_lock);
-	memflags = blk_mq_freeze_queue(q);
-	mutex_lock(&q->elevator_lock);
 	ret = elevator_change(q, name);
 	if (!ret)
 		ret = count;
-	mutex_unlock(&q->elevator_lock);
-	blk_mq_unfreeze_queue(q, memflags);
 	up_read(&set->update_nr_hwq_lock);
 	return ret;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (13 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 14/25] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  4:42   ` Christoph Hellwig
  2025-05-05 14:17 ` [PATCH V5 16/25] block: " Ming Lei
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Add `struct elv_change_ctx` and prepare for unifying elevator change by
elevator_change(). With this way, any input & output parameter can
be provided & observed in top helper.

This way helps to move kobject add/delete & debugfs register/unregister
out of ->elevator_lock & freezing queue.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index cb54a3791fe5..6cfac8f77d9f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -45,6 +45,12 @@
 #include "blk-wbt.h"
 #include "blk-cgroup.h"
 
+/* Holding context data for changing elevator */
+struct elv_change_ctx {
+	const char *name;
+	bool no_uevent;
+};
+
 static DEFINE_SPINLOCK(elv_list_lock);
 static LIST_HEAD(elv_list);
 
@@ -621,7 +627,7 @@ void elevator_init_mq(struct request_queue *q)
  * If switching fails, we are most likely running out of memory and not able
  * to restore the old io scheduler, so leaving the io scheduler being none.
  */
-static int elevator_switch(struct request_queue *q, const char *name)
+static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 {
 	struct elevator_type *new_e = NULL;
 	int ret = 0;
@@ -629,8 +635,8 @@ static int elevator_switch(struct request_queue *q, const char *name)
 	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 	lockdep_assert_held(&q->elevator_lock);
 
-	if (strncmp(name, "none", 4)) {
-		new_e = elevator_find_get(name);
+	if (strncmp(ctx->name, "none", 4)) {
+		new_e = elevator_find_get(ctx->name);
 		if (!new_e)
 			return -EINVAL;
 	}
@@ -646,7 +652,7 @@ static int elevator_switch(struct request_queue *q, const char *name)
 		ret = blk_mq_init_sched(q, new_e);
 		if (ret)
 			goto out_unfreeze;
-		ret = elv_register_queue(q, true);
+		ret = elv_register_queue(q, !ctx->no_uevent);
 		if (ret) {
 			elevator_exit(q);
 			goto out_unfreeze;
@@ -656,7 +662,7 @@ static int elevator_switch(struct request_queue *q, const char *name)
 		q->elevator = NULL;
 		q->nr_requests = q->tag_set->queue_depth;
 	}
-	blk_add_trace_msg(q, "elv switch: %s", name);
+	blk_add_trace_msg(q, "elv switch: %s", ctx->name);
 
 out_unfreeze:
 	blk_mq_unquiesce_queue(q);
@@ -674,7 +680,7 @@ static int elevator_switch(struct request_queue *q, const char *name)
 /*
  * Switch this queue to the given IO scheduler.
  */
-static int elevator_change(struct request_queue *q, const char *elevator_name)
+static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
 {
 	unsigned int memflags;
 	int ret = 0;
@@ -683,9 +689,8 @@ static int elevator_change(struct request_queue *q, const char *elevator_name)
 
 	memflags = blk_mq_freeze_queue(q);
 	mutex_lock(&q->elevator_lock);
-	if (!(q->elevator && elevator_match(q->elevator->type,
-				elevator_name)))
-		ret = elevator_switch(q, elevator_name);
+	if (!(q->elevator && elevator_match(q->elevator->type, ctx->name)))
+		ret = elevator_switch(q, ctx);
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
 	return ret;
@@ -701,15 +706,17 @@ void elv_update_nr_hw_queues(struct request_queue *q)
 
 	mutex_lock(&q->elevator_lock);
 	if (q->elevator && !blk_queue_dying(q) && !blk_queue_registered(q)) {
-		const char *name = q->elevator->type->elevator_name;
+		struct elv_change_ctx ctx = {
+			.name = q->elevator->type->elevator_name,
+		};
 
 		/* force to reattach elevator after nr_hw_queue is updated */
-		elevator_switch(q, name);
+		elevator_switch(q, &ctx);
 	}
 	mutex_unlock(&q->elevator_lock);
 }
 
-static void elv_iosched_load_module(char *elevator_name)
+static void elv_iosched_load_module(const char *elevator_name)
 {
 	struct elevator_type *found;
 
@@ -725,7 +732,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 			  size_t count)
 {
 	char elevator_name[ELV_NAME_MAX];
-	char *name;
+	struct elv_change_ctx ctx = {};
 	int ret;
 	struct request_queue *q = disk->queue;
 	struct blk_mq_tag_set *set = q->tag_set;
@@ -740,12 +747,12 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	 * queue is the one for the device storing the module file.
 	 */
 	strscpy(elevator_name, buf, sizeof(elevator_name));
-	name = strstrip(elevator_name);
+	ctx.name = strstrip(elevator_name);
 
-	elv_iosched_load_module(name);
+	elv_iosched_load_module(ctx.name);
 
 	down_read(&set->update_nr_hwq_lock);
-	ret = elevator_change(q, name);
+	ret = elevator_change(q, &ctx);
 	if (!ret)
 		ret = count;
 	up_read(&set->update_nr_hwq_lock);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 16/25] block: unifying elevator change
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (14 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 17/25] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Elevator change is one well-define behavior:

- tear down current elevator if it exists

- setup new elevator

It is supposed to cover any case for changing elevator by single
internal API, typically the following cases:

- setup default elevator in add_disk()

- switch to none in del_disk()

- reset elevator in blk_mq_update_nr_hw_queues()

- switch elevator in sysfs `store` elevator attribute

This patch uses elevator_change() to cover all above cases:

- every elevator switch is serialized with each other: add_disk/del_disk/
store elevator is serialized already, blk_mq_update_nr_hw_queues() uses
srcu for syncing with the other three cases

- for both add_disk()/del_disk(), queue freeze works at atomic mode
or has been froze, so the freeze in elevator_change() won't add extra
delay

- `struct elev_change_ctx` instance holds any info for changing elevator

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-sysfs.c |  19 +++-----
 block/blk.h       |   5 +-
 block/elevator.c  | 116 +++++++++++++++++++++-------------------------
 block/genhd.c     |  28 ++---------
 4 files changed, 67 insertions(+), 101 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1f9b45b0b9ee..741e607dfab6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -869,14 +869,9 @@ int blk_register_queue(struct gendisk *disk)
 	if (ret)
 		goto out_unregister_ia_ranges;
 
+	if (queue_is_mq(q))
+		elevator_set_default(q);
 	mutex_lock(&q->elevator_lock);
-	if (q->elevator) {
-		ret = elv_register_queue(q, false);
-		if (ret) {
-			mutex_unlock(&q->elevator_lock);
-			goto out_crypto_sysfs_unregister;
-		}
-	}
 	wbt_enable_default(disk);
 	mutex_unlock(&q->elevator_lock);
 
@@ -902,8 +897,6 @@ int blk_register_queue(struct gendisk *disk)
 
 	return ret;
 
-out_crypto_sysfs_unregister:
-	blk_crypto_sysfs_unregister(disk);
 out_unregister_ia_ranges:
 	disk_unregister_independent_access_ranges(disk);
 out_debugfs_remove:
@@ -951,9 +944,11 @@ void blk_unregister_queue(struct gendisk *disk)
 		blk_mq_sysfs_unregister(disk);
 	blk_crypto_sysfs_unregister(disk);
 
-	mutex_lock(&q->elevator_lock);
-	elv_unregister_queue(q);
-	mutex_unlock(&q->elevator_lock);
+	if (queue_is_mq(q)) {
+		blk_mq_quiesce_queue(q);
+		elevator_set_none(q);
+		blk_mq_unquiesce_queue(q);
+	}
 
 	mutex_lock(&q->sysfs_lock);
 	disk_unregister_independent_access_ranges(disk);
diff --git a/block/blk.h b/block/blk.h
index 98fd3a6f5ec9..23b6ed27e5d1 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -323,9 +323,8 @@ bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
 bool blk_insert_flush(struct request *rq);
 
 void elv_update_nr_hw_queues(struct request_queue *q);
-void elevator_exit(struct request_queue *q);
-int elv_register_queue(struct request_queue *q, bool uevent);
-void elv_unregister_queue(struct request_queue *q);
+void elevator_set_default(struct request_queue *q);
+void elevator_set_none(struct request_queue *q);
 
 ssize_t part_size_show(struct device *dev, struct device_attribute *attr,
 		char *buf);
diff --git a/block/elevator.c b/block/elevator.c
index 6cfac8f77d9f..540542cee21c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -154,7 +154,7 @@ static void elevator_release(struct kobject *kobj)
 	kfree(e);
 }
 
-void elevator_exit(struct request_queue *q)
+static void elevator_exit(struct request_queue *q)
 {
 	struct elevator_queue *e = q->elevator;
 
@@ -458,7 +458,7 @@ static const struct kobj_type elv_ktype = {
 	.release	= elevator_release,
 };
 
-int elv_register_queue(struct request_queue *q, bool uevent)
+static int elv_register_queue(struct request_queue *q, bool uevent)
 {
 	struct elevator_queue *e = q->elevator;
 	int error;
@@ -488,7 +488,7 @@ int elv_register_queue(struct request_queue *q, bool uevent)
 	return error;
 }
 
-void elv_unregister_queue(struct request_queue *q)
+static void elv_unregister_queue(struct request_queue *q)
 {
 	struct elevator_queue *e = q->elevator;
 
@@ -561,66 +561,6 @@ void elv_unregister(struct elevator_type *e)
 }
 EXPORT_SYMBOL_GPL(elv_unregister);
 
-/*
- * For single queue devices, default to using mq-deadline. If we have multiple
- * queues or mq-deadline is not available, default to "none".
- */
-static struct elevator_type *elevator_get_default(struct request_queue *q)
-{
-	if (q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT)
-		return NULL;
-
-	if (q->nr_hw_queues != 1 &&
-	    !blk_mq_is_shared_tags(q->tag_set->flags))
-		return NULL;
-
-	return elevator_find_get("mq-deadline");
-}
-
-/*
- * Use the default elevator settings. If the chosen elevator initialization
- * fails, fall back to the "none" elevator (no elevator).
- */
-void elevator_init_mq(struct request_queue *q)
-{
-	struct elevator_type *e;
-	unsigned int memflags;
-	int err;
-
-	WARN_ON_ONCE(blk_queue_registered(q));
-
-	if (unlikely(q->elevator))
-		return;
-
-	e = elevator_get_default(q);
-	if (!e)
-		return;
-
-	/*
-	 * We are called before adding disk, when there isn't any FS I/O,
-	 * so freezing queue plus canceling dispatch work is enough to
-	 * drain any dispatch activities originated from passthrough
-	 * requests, then no need to quiesce queue which may add long boot
-	 * latency, especially when lots of disks are involved.
-	 *
-	 * Disk isn't added yet, so verifying queue lock only manually.
-	 */
-	memflags = blk_mq_freeze_queue(q);
-
-	blk_mq_cancel_work_sync(q);
-
-	err = blk_mq_init_sched(q, e);
-
-	blk_mq_unfreeze_queue(q, memflags);
-
-	if (err) {
-		pr_warn("\"%s\" elevator initialization failed, "
-			"falling back to \"none\"\n", e->elevator_name);
-	}
-
-	elevator_put(e);
-}
-
 /*
  * Switch to new_e io scheduler.
  *
@@ -688,6 +628,16 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
 	lockdep_assert_held(&q->tag_set->update_nr_hwq_lock);
 
 	memflags = blk_mq_freeze_queue(q);
+	/*
+	 * May be called before adding disk, when there isn't any FS I/O,
+	 * so freezing queue plus canceling dispatch work is enough to
+	 * drain any dispatch activities originated from passthrough
+	 * requests, then no need to quiesce queue which may add long boot
+	 * latency, especially when lots of disks are involved.
+	 *
+	 * Disk isn't added yet, so verifying queue lock only manually.
+	 */
+	blk_mq_cancel_work_sync(q);
 	mutex_lock(&q->elevator_lock);
 	if (!(q->elevator && elevator_match(q->elevator->type, ctx->name)))
 		ret = elevator_switch(q, ctx);
@@ -716,6 +666,46 @@ void elv_update_nr_hw_queues(struct request_queue *q)
 	mutex_unlock(&q->elevator_lock);
 }
 
+/*
+ * Use the default elevator settings. If the chosen elevator initialization
+ * fails, fall back to the "none" elevator (no elevator).
+ */
+void elevator_set_default(struct request_queue *q)
+{
+	struct elv_change_ctx ctx = {
+		.name = "mq-deadline",
+		.no_uevent = true,
+	};
+	int err = 0;
+
+	if (q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT)
+		return;
+
+	/*
+	 * For single queue devices, default to using mq-deadline. If we
+	 * have multiple queues or mq-deadline is not available, default
+	 * to "none".
+	 */
+	if (elevator_find_get(ctx.name) && (q->nr_hw_queues == 1 ||
+			 blk_mq_is_shared_tags(q->tag_set->flags)))
+		err = elevator_change(q, &ctx);
+	if (err < 0)
+		pr_warn("\"%s\" elevator initialization, failed %d, "
+			"falling back to \"none\"\n", ctx.name, err);
+}
+
+void elevator_set_none(struct request_queue *q)
+{
+	struct elv_change_ctx ctx = {
+		.name	= "none",
+	};
+	int err;
+
+	err = elevator_change(q, &ctx);
+	if (err < 0)
+		pr_warn("%s: set none elevator failed %d\n", __func__, err);
+}
+
 static void elv_iosched_load_module(const char *elevator_name)
 {
 	struct elevator_type *found;
diff --git a/block/genhd.c b/block/genhd.c
index e0dd8ecc925f..f192fe4808b9 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -432,12 +432,6 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
 		 */
 		if (disk->fops->submit_bio || disk->fops->poll_bio)
 			return -EINVAL;
-
-		/*
-		 * Initialize the I/O scheduler code and pick a default one if
-		 * needed.
-		 */
-		elevator_init_mq(disk->queue);
 	} else {
 		if (!disk->fops->submit_bio)
 			return -EINVAL;
@@ -454,7 +448,7 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
 	ret = -EINVAL;
 	if (disk->major) {
 		if (WARN_ON(!disk->minors))
-			goto out_exit_elevator;
+			goto out;
 
 		if (disk->minors > DISK_MAX_PARTS) {
 			pr_err("block: can't allocate more than %d partitions\n",
@@ -464,14 +458,14 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
 		if (disk->first_minor > MINORMASK ||
 		    disk->minors > MINORMASK + 1 ||
 		    disk->first_minor + disk->minors > MINORMASK + 1)
-			goto out_exit_elevator;
+			goto out;
 	} else {
 		if (WARN_ON(disk->minors))
-			goto out_exit_elevator;
+			goto out;
 
 		ret = blk_alloc_ext_minor();
 		if (ret < 0)
-			goto out_exit_elevator;
+			goto out;
 		disk->major = BLOCK_EXT_MAJOR;
 		disk->first_minor = ret;
 	}
@@ -561,12 +555,7 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
 out_free_ext_minor:
 	if (disk->major == BLOCK_EXT_MAJOR)
 		blk_free_ext_minor(disk->first_minor);
-out_exit_elevator:
-	if (disk->queue->elevator) {
-		mutex_lock(&disk->queue->elevator_lock);
-		elevator_exit(disk->queue);
-		mutex_unlock(&disk->queue->elevator_lock);
-	}
+out:
 	return ret;
 }
 
@@ -760,14 +749,7 @@ static void __del_gendisk(struct gendisk *disk)
 	if (queue_is_mq(q))
 		blk_mq_cancel_work_sync(q);
 
-	blk_mq_quiesce_queue(q);
-	if (q->elevator) {
-		mutex_lock(&q->elevator_lock);
-		elevator_exit(q);
-		mutex_unlock(&q->elevator_lock);
-	}
 	rq_qos_exit(q);
-	blk_mq_unquiesce_queue(q);
 
 	/*
 	 * If the disk does not own the queue, allow using passthrough requests
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 17/25] block: pass elevator_queue to elv_register_queue & unregister_queue
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (15 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 16/25] block: " Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 18/25] block: remove elevator queue's type check in elv_attr_show/store() Ming Lei
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Pass elevator_queue reference to elv_register_queue() & elv_unregister_queue().

No functional change, and prepare for moving the two out of elevator
lock & freezing queue, when we need to store the old & new elevator
queue in `struct elv_change_ctx` instance, then both two can co-exist
for short while, so we have to pass the exact elevator_queue instance
to elv_register_queue & unregister_queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 540542cee21c..eb7140a678d5 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -458,9 +458,10 @@ static const struct kobj_type elv_ktype = {
 	.release	= elevator_release,
 };
 
-static int elv_register_queue(struct request_queue *q, bool uevent)
+static int elv_register_queue(struct request_queue *q,
+			      struct elevator_queue *e,
+			      bool uevent)
 {
-	struct elevator_queue *e = q->elevator;
 	int error;
 
 	lockdep_assert_held(&q->elevator_lock);
@@ -488,10 +489,9 @@ static int elv_register_queue(struct request_queue *q, bool uevent)
 	return error;
 }
 
-static void elv_unregister_queue(struct request_queue *q)
+static void elv_unregister_queue(struct request_queue *q,
+				 struct elevator_queue *e)
 {
-	struct elevator_queue *e = q->elevator;
-
 	lockdep_assert_held(&q->elevator_lock);
 
 	if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) {
@@ -584,7 +584,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 	blk_mq_quiesce_queue(q);
 
 	if (q->elevator) {
-		elv_unregister_queue(q);
+		elv_unregister_queue(q, q->elevator);
 		elevator_exit(q);
 	}
 
@@ -592,7 +592,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 		ret = blk_mq_init_sched(q, new_e);
 		if (ret)
 			goto out_unfreeze;
-		ret = elv_register_queue(q, !ctx->no_uevent);
+		ret = elv_register_queue(q, q->elevator, !ctx->no_uevent);
 		if (ret) {
 			elevator_exit(q);
 			goto out_unfreeze;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 18/25] block: remove elevator queue's type check in elv_attr_show/store()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (16 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 17/25] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-05 14:17 ` [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

elevatore queue's type is assigned since its allocation, and never
get cleared until it is released.

So its ->type is always not NULL, remove the unnecessary check.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index eb7140a678d5..fa436417da3b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -425,7 +425,7 @@ elv_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 
 	e = container_of(kobj, struct elevator_queue, kobj);
 	mutex_lock(&e->sysfs_lock);
-	error = e->type ? entry->show(e, page) : -ENOENT;
+	error = entry->show(e, page);
 	mutex_unlock(&e->sysfs_lock);
 	return error;
 }
@@ -443,7 +443,7 @@ elv_attr_store(struct kobject *kobj, struct attribute *attr,
 
 	e = container_of(kobj, struct elevator_queue, kobj);
 	mutex_lock(&e->sysfs_lock);
-	error = e->type ? entry->store(e, page, length) : -ENOENT;
+	error = entry->store(e, page, length);
 	mutex_unlock(&e->sysfs_lock);
 	return error;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (17 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 18/25] block: remove elevator queue's type check in elv_attr_show/store() Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06 11:09   ` Hannes Reinecke
  2025-05-05 14:17 ` [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk Ming Lei
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei

Prepare for moving elv_register[unregister]_queue out of elevator_lock
& queue freezing, so we may have to call elv_unregister_queue() after
elevator ->exit() is called, then there is small window for user to
call into ->show()/store(), and user-after-free can be caused.

Fail to show/store elevator sysfs attribute if elevator is dying by
adding one new flag of ELEVATOR_FLAG_DYNG, which is protected by
elevator ->sysfs_lock.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c |  1 +
 block/elevator.c     | 10 ++++++----
 block/elevator.h     |  1 +
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 336a15ffecfa..55a0fd105147 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -551,5 +551,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 	if (e->type->ops.exit_sched)
 		e->type->ops.exit_sched(e);
 	blk_mq_sched_tags_teardown(q, flags);
+	set_bit(ELEVATOR_FLAG_DYING, &q->elevator->flags);
 	q->elevator = NULL;
 }
diff --git a/block/elevator.c b/block/elevator.c
index fa436417da3b..2edaf84900fc 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -418,14 +418,15 @@ elv_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 {
 	const struct elv_fs_entry *entry = to_elv(attr);
 	struct elevator_queue *e;
-	ssize_t error;
+	ssize_t error = -ENODEV;
 
 	if (!entry->show)
 		return -EIO;
 
 	e = container_of(kobj, struct elevator_queue, kobj);
 	mutex_lock(&e->sysfs_lock);
-	error = entry->show(e, page);
+	if (!test_bit(ELEVATOR_FLAG_DYING, &e->flags))
+		error = entry->show(e, page);
 	mutex_unlock(&e->sysfs_lock);
 	return error;
 }
@@ -436,14 +437,15 @@ elv_attr_store(struct kobject *kobj, struct attribute *attr,
 {
 	const struct elv_fs_entry *entry = to_elv(attr);
 	struct elevator_queue *e;
-	ssize_t error;
+	ssize_t error = -ENODEV;
 
 	if (!entry->store)
 		return -EIO;
 
 	e = container_of(kobj, struct elevator_queue, kobj);
 	mutex_lock(&e->sysfs_lock);
-	error = entry->store(e, page, length);
+	if (!test_bit(ELEVATOR_FLAG_DYING, &e->flags))
+		error = entry->store(e, page, length);
 	mutex_unlock(&e->sysfs_lock);
 	return error;
 }
diff --git a/block/elevator.h b/block/elevator.h
index 9198676644a9..76a90a1b7ed6 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -121,6 +121,7 @@ struct elevator_queue
 };
 
 #define ELEVATOR_FLAG_REGISTERED	0
+#define ELEVATOR_FLAG_DYING		1
 
 /*
  * block elevator interface
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (18 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  6:32   ` Nilay Shroff
  2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Add new helper disable_elv_switch() and new flag QUEUE_FLAG_NO_ELV_SWITCH
for disabling elevator switch before deleting disk:

- originally flag QUEUE_FLAG_REGISTERED is added for preventing elevator
switch during removing disk, but this flag has been used widely for
other purposes, so add one new flag for disabling elevator switch only

- for avoiding deadlock risk, we have to move elevator queue
register/unregister out of elevator lock and queue freeze, which will be
done in next patch. However, this way adds small race window between elevator
switch and deleting ->queue_kobj, in which elevator queue register/unregister
could be run concurrently. The added helper will be used for avoiding the race
in the following patch.

- drain in-progress elevator switch before deleting disk

Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/elevator.c       | 13 ++++++++++---
 block/genhd.c          | 13 +++++++++++++
 include/linux/blkdev.h |  3 +++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 2837a8ce8054..29b3540dd180 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -94,6 +94,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
 	QUEUE_FLAG_NAME(SQ_SCHED),
 	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
+	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/block/elevator.c b/block/elevator.c
index 2edaf84900fc..f7e333abefe3 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -680,6 +680,9 @@ void elevator_set_default(struct request_queue *q)
 	};
 	int err = 0;
 
+	/* now we allow to switch elevator */
+	blk_queue_flag_clear(QUEUE_FLAG_NO_ELV_SWITCH, q);
+
 	if (q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT)
 		return;
 
@@ -744,9 +747,13 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	elv_iosched_load_module(ctx.name);
 
 	down_read(&set->update_nr_hwq_lock);
-	ret = elevator_change(q, &ctx);
-	if (!ret)
-		ret = count;
+	if (!blk_queue_no_elv_switch(q)) {
+		ret = elevator_change(q, &ctx);
+		if (!ret)
+			ret = count;
+	} else {
+		ret = -ENOENT;
+	}
 	up_read(&set->update_nr_hwq_lock);
 	return ret;
 }
diff --git a/block/genhd.c b/block/genhd.c
index f192fe4808b9..a8cb5607b6e3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -764,6 +764,16 @@ static void __del_gendisk(struct gendisk *disk)
 		blk_unfreeze_release_lock(q);
 }
 
+static void disable_elv_switch(struct request_queue *q)
+{
+	struct blk_mq_tag_set *set = q->tag_set;
+	WARN_ON_ONCE(!queue_is_mq(q));
+
+	down_write(&set->update_nr_hwq_lock);
+	blk_queue_flag_set(QUEUE_FLAG_NO_ELV_SWITCH, q);
+	up_write(&set->update_nr_hwq_lock);
+}
+
 /**
  * del_gendisk - remove the gendisk
  * @disk: the struct gendisk to remove
@@ -792,6 +802,9 @@ void del_gendisk(struct gendisk *disk)
 		__del_gendisk(disk);
 	} else {
 		set = disk->queue->tag_set;
+
+		disable_elv_switch(disk->queue);
+
 		memflags = memalloc_noio_save();
 		down_read(&set->update_nr_hwq_lock);
 		__del_gendisk(disk);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9c373cf0eb47..b15c53fabe9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -645,6 +645,7 @@ enum {
 	QUEUE_FLAG_HCTX_ACTIVE,		/* at least one blk-mq hctx is active */
 	QUEUE_FLAG_SQ_SCHED,		/* single queue style io dispatch */
 	QUEUE_FLAG_DISABLE_WBT_DEF,	/* for sched to disable/enable wbt */
+	QUEUE_FLAG_NO_ELV_SWITCH,	/* can't switch elevator any more */
 	QUEUE_FLAG_MAX
 };
 
@@ -682,6 +683,8 @@ void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
 	((q)->limits.features & BLK_FEAT_SKIP_TAGSET_QUIESCE)
 #define blk_queue_disable_wbt(q)	\
 	test_bit(QUEUE_FLAG_DISABLE_WBT_DEF, &(q)->queue_flags)
+#define blk_queue_no_elv_switch(q)	\
+	test_bit(QUEUE_FLAG_NO_ELV_SWITCH, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (19 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk Ming Lei
@ 2025-05-05 14:17 ` Ming Lei
  2025-05-06  4:43   ` Christoph Hellwig
  2025-05-06  6:36   ` Nilay Shroff
  2025-05-05 14:18 ` [PATCH V5 22/25] block: move hctx debugfs/sysfs registering out of freezing queue Ming Lei
                   ` (4 subsequent siblings)
  25 siblings, 2 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move elv_register[unregister]_queue out of ->elevator_lock & queue freezing,
so we can kill many lockdep warnings.

elv_register[unregister]_queue() is serialized, and just dealing with sysfs/
debugfs things, no need to be done with queue frozen:

- when it is called from adding disk, elevator switch isn't possible
  because ->queue_kobj isn't added yet

- when it is called from deleting disk, disable_elv_switch() is
  responsible for preventing new elevator switch and draining old
  elevator switch.

- when it is called from blk_mq_update_nr_hw_queues(), adding/removing
  disk and elevator switch can't be allowed or in-progress

With this change, elevator's ->exit() is called before calling
elv_unregister_queue, then user may call into ->show()/store() of elevator's
sysfs attributes, and we have covered this issue by adding `ELEVATOR_FLAG_DYNG`.

For blk-mq debugfs, hctx->sched_tags is always checked with ->elevator_lock by
debugfs code, meantime hctx->sched_tags is updated with ->elevator_lock, so
there isn't such issue.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c   |  3 +--
 block/elevator.c | 68 ++++++++++++++++++++++++++++++++++++------------
 2 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29f67b0e1fd5..d1b5b75840eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -5043,11 +5043,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_debugfs_register_hctxs(q);
 	}
 
+	/* elv_update_nr_hw_queues() unfreeze queue for us */
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		elv_update_nr_hw_queues(q);
 
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_unfreeze_queue_nomemrestore(q);
 	memalloc_noio_restore(memflags);
 
 	/* Free the excess tags when nr_hw_queues shrink. */
diff --git a/block/elevator.c b/block/elevator.c
index f7e333abefe3..8578b969e173 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -49,6 +49,11 @@
 struct elv_change_ctx {
 	const char *name;
 	bool no_uevent;
+
+	/* for unregistering old elevator */
+	struct elevator_queue *old;
+	/* for registering new elevator */
+	struct elevator_queue *new;
 };
 
 static DEFINE_SPINLOCK(elv_list_lock);
@@ -158,14 +163,14 @@ static void elevator_exit(struct request_queue *q)
 {
 	struct elevator_queue *e = q->elevator;
 
+	lockdep_assert_held(&q->elevator_lock);
+
 	ioc_clear_queue(q);
 	blk_mq_sched_free_rqs(q);
 
 	mutex_lock(&e->sysfs_lock);
 	blk_mq_exit_sched(q, e);
 	mutex_unlock(&e->sysfs_lock);
-
-	kobject_put(&e->kobj);
 }
 
 static inline void __elv_rqhash_del(struct request *rq)
@@ -466,8 +471,6 @@ static int elv_register_queue(struct request_queue *q,
 {
 	int error;
 
-	lockdep_assert_held(&q->elevator_lock);
-
 	error = kobject_add(&e->kobj, &q->disk->queue_kobj, "iosched");
 	if (!error) {
 		const struct elv_fs_entry *attr = e->type->elevator_attrs;
@@ -494,8 +497,6 @@ static int elv_register_queue(struct request_queue *q,
 static void elv_unregister_queue(struct request_queue *q,
 				 struct elevator_queue *e)
 {
-	lockdep_assert_held(&q->elevator_lock);
-
 	if (e && test_and_clear_bit(ELEVATOR_FLAG_REGISTERED, &e->flags)) {
 		kobject_uevent(&e->kobj, KOBJ_REMOVE);
 		kobject_del(&e->kobj);
@@ -586,7 +587,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 	blk_mq_quiesce_queue(q);
 
 	if (q->elevator) {
-		elv_unregister_queue(q, q->elevator);
+		ctx->old = q->elevator;
 		elevator_exit(q);
 	}
 
@@ -594,11 +595,7 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 		ret = blk_mq_init_sched(q, new_e);
 		if (ret)
 			goto out_unfreeze;
-		ret = elv_register_queue(q, q->elevator, !ctx->no_uevent);
-		if (ret) {
-			elevator_exit(q);
-			goto out_unfreeze;
-		}
+		ctx->new = q->elevator;
 	} else {
 		blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
 		q->elevator = NULL;
@@ -619,6 +616,38 @@ static int elevator_switch(struct request_queue *q, struct elv_change_ctx *ctx)
 	return ret;
 }
 
+static void elv_exit_and_release(struct request_queue *q)
+{
+	struct elevator_queue *e;
+	unsigned memflags;
+
+	memflags = blk_mq_freeze_queue(q);
+	mutex_lock(&q->elevator_lock);
+	e = q->elevator;
+	elevator_exit(q);
+	mutex_unlock(&q->elevator_lock);
+	blk_mq_unfreeze_queue(q, memflags);
+	if (e)
+		kobject_put(&e->kobj);
+}
+
+static int elevator_change_done(struct request_queue *q,
+				struct elv_change_ctx *ctx)
+{
+	int ret = 0;
+
+	if (ctx->old) {
+		elv_unregister_queue(q, ctx->old);
+		kobject_put(&ctx->old->kobj);
+	}
+	if (ctx->new) {
+		ret = elv_register_queue(q, ctx->new, !ctx->no_uevent);
+		if (ret)
+			elv_exit_and_release(q);
+	}
+	return ret;
+}
+
 /*
  * Switch this queue to the given IO scheduler.
  */
@@ -645,6 +674,9 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
 		ret = elevator_switch(q, ctx);
 	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
+	if (!ret)
+		ret = elevator_change_done(q, ctx);
+
 	return ret;
 }
 
@@ -654,18 +686,22 @@ static int elevator_change(struct request_queue *q, struct elv_change_ctx *ctx)
  */
 void elv_update_nr_hw_queues(struct request_queue *q)
 {
+	struct elv_change_ctx ctx = {};
+	int ret = -ENODEV;
+
 	WARN_ON_ONCE(q->mq_freeze_depth == 0);
 
 	mutex_lock(&q->elevator_lock);
 	if (q->elevator && !blk_queue_dying(q) && !blk_queue_registered(q)) {
-		struct elv_change_ctx ctx = {
-			.name = q->elevator->type->elevator_name,
-		};
+		ctx.name = q->elevator->type->elevator_name;
 
 		/* force to reattach elevator after nr_hw_queue is updated */
-		elevator_switch(q, &ctx);
+		ret = elevator_switch(q, &ctx);
 	}
 	mutex_unlock(&q->elevator_lock);
+	blk_mq_unfreeze_queue_nomemrestore(q);
+	if (!ret)
+		WARN_ON_ONCE(elevator_change_done(q, &ctx));
 }
 
 /*
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 22/25] block: move hctx debugfs/sysfs registering out of freezing queue
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (20 preceding siblings ...)
  2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
@ 2025-05-05 14:18 ` Ming Lei
  2025-05-05 14:18 ` [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs Ming Lei
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move hctx debugfs/sysfs register out of freezing queue in
__blk_mq_update_nr_hw_queues(), so that the following lockdep dependency
can be killed:

	#2 (&q->q_usage_counter(io)#16){++++}-{0:0}:
	#1 (fs_reclaim){+.+.}-{0:0}:
	#0 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}: //debugfs

And registering/un-registering hctx debugfs/sysfs does not require queue to
be frozen:

- hctx sysfs attributes show() are drained when removing kobject, and
  there isn't store() implementation for hctx sysfs attributes

- debugfs entry read() is drained too when removing debugfs directory,
  and there isn't write() implementation for hctx debugfs too

- so it is safe to register/unregister hctx sysfs/debugfs without
  freezing queue because the cod paths changes nothing, and we just
  need to keep hctx live

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d1b5b75840eb..32526d120bf4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -5007,14 +5007,14 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		return;
 
 	memflags = memalloc_noio_save();
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_freeze_queue_nomemsave(q);
-
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_debugfs_unregister_hctxs(q);
 		blk_mq_sysfs_unregister_hctxs(q);
 	}
 
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		blk_mq_freeze_queue_nomemsave(q);
+
 	if (blk_mq_realloc_tag_set_tags(set, nr_hw_queues) < 0)
 		goto reregister;
 
@@ -5037,16 +5037,15 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_map_swqueue(q);
 	}
 
+	/* elv_update_nr_hw_queues() unfreeze queue for us */
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		elv_update_nr_hw_queues(q);
+
 reregister:
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_sysfs_register_hctxs(q);
 		blk_mq_debugfs_register_hctxs(q);
 	}
-
-	/* elv_update_nr_hw_queues() unfreeze queue for us */
-	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		elv_update_nr_hw_queues(q);
-
 	memalloc_noio_restore(memflags);
 
 	/* Free the excess tags when nr_hw_queues shrink. */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (21 preceding siblings ...)
  2025-05-05 14:18 ` [PATCH V5 22/25] block: move hctx debugfs/sysfs registering out of freezing queue Ming Lei
@ 2025-05-05 14:18 ` Ming Lei
  2025-05-06  4:44   ` Christoph Hellwig
  2025-05-05 14:18 ` [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing Ming Lei
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Both blk_mq_map_swqueue() and blk_mq_realloc_hw_ctxs() are called before
the request queue is added to tagset list, so the two won't run concurrently
with blk_mq_update_nr_hw_queues().

When the two functions are only called from queue initialization or
blk_mq_update_nr_hw_queues(), elevator switch can't happen.

So remove ->elevator_lock uses from the two functions.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32526d120bf4..c6098e569c02 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4158,8 +4158,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	struct blk_mq_ctx *ctx;
 	struct blk_mq_tag_set *set = q->tag_set;
 
-	mutex_lock(&q->elevator_lock);
-
 	queue_for_each_hw_ctx(q, hctx, i) {
 		cpumask_clear(hctx->cpumask);
 		hctx->nr_ctx = 0;
@@ -4264,8 +4262,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 		hctx->next_cpu = blk_mq_first_mapped_cpu(hctx);
 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
 	}
-
-	mutex_unlock(&q->elevator_lock);
 }
 
 /*
@@ -4569,16 +4565,9 @@ static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 }
 
 static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
-				   struct request_queue *q, bool lock)
+				   struct request_queue *q)
 {
-	if (lock) {
-		/* protect against switching io scheduler  */
-		mutex_lock(&q->elevator_lock);
-		__blk_mq_realloc_hw_ctxs(set, q);
-		mutex_unlock(&q->elevator_lock);
-	} else {
-		__blk_mq_realloc_hw_ctxs(set, q);
-	}
+	__blk_mq_realloc_hw_ctxs(set, q);
 
 	/* unregister cpuhp callbacks for exited hctxs */
 	blk_mq_remove_hw_queues_cpuhp(q);
@@ -4610,7 +4599,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	xa_init(&q->hctx_table);
 
-	blk_mq_realloc_hw_ctxs(set, q, false);
+	blk_mq_realloc_hw_ctxs(set, q);
 	if (!q->nr_hw_queues)
 		goto err_hctxs;
 
@@ -5021,7 +5010,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 fallback:
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		blk_mq_realloc_hw_ctxs(set, q, true);
+		blk_mq_realloc_hw_ctxs(set, q);
 
 		if (q->nr_hw_queues != set->nr_hw_queues) {
 			int i = prev_nr_hw_queues;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (22 preceding siblings ...)
  2025-05-05 14:18 ` [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs Ming Lei
@ 2025-05-05 14:18 ` Ming Lei
  2025-05-06  4:44   ` Christoph Hellwig
  2025-05-05 14:18 ` [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
  2025-05-06 13:48 ` [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Jens Axboe
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

Move hctx cpuhp add/del out of queue freezing for not connecting freeze
lock with cpuhp locks, then lockdep warning can be avoided.

This way is safe because both needn't queue to be frozen and scheduler
switch isn't allowed, with same reason for moving hctx debugfs/sysfs
register out of queue freeze.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6098e569c02..88d5546e54f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -5010,7 +5010,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 fallback:
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		blk_mq_realloc_hw_ctxs(set, q);
+		__blk_mq_realloc_hw_ctxs(set, q);
 
 		if (q->nr_hw_queues != set->nr_hw_queues) {
 			int i = prev_nr_hw_queues;
@@ -5034,6 +5034,9 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_sysfs_register_hctxs(q);
 		blk_mq_debugfs_register_hctxs(q);
+
+		blk_mq_remove_hw_queues_cpuhp(q);
+		blk_mq_add_hw_queues_cpuhp(q);
 	}
 	memalloc_noio_restore(memflags);
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit()
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (23 preceding siblings ...)
  2025-05-05 14:18 ` [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing Ming Lei
@ 2025-05-05 14:18 ` Ming Lei
  2025-05-06  4:44   ` Christoph Hellwig
  2025-05-06 13:48 ` [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Jens Axboe
  25 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2025-05-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

scheduler's ->exit() is called with queue frozen and elevator lock is held, and
wbt_enable_default() can't be called with queue frozen, otherwise the
following lockdep warning is triggered:

	#6 (&q->rq_qos_mutex){+.+.}-{4:4}:
	#5 (&eq->sysfs_lock){+.+.}-{4:4}:
	#4 (&q->elevator_lock){+.+.}-{4:4}:
	#3 (&q->q_usage_counter(io)#3){++++}-{0:0}:
	#2 (fs_reclaim){+.+.}-{0:0}:
	#1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
	#0 (&q->debugfs_mutex){+.+.}-{4:4}:

Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
call it from elevator_change_done().

Meantime add disk->rqos_state_mutex for covering wbt state change, which
matches the purpose more than ->elevator_lock.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c    |  2 +-
 block/blk-sysfs.c      | 10 ++++------
 block/blk-wbt.c        |  6 ++++++
 block/elevator.c       |  5 +++++
 block/elevator.h       |  1 +
 block/genhd.c          |  1 +
 include/linux/blkdev.h |  2 ++
 7 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index cc6f59836dcd..0cb1e9873aab 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7211,7 +7211,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
 
 	blk_stat_disable_accounting(bfqd->queue);
 	blk_queue_flag_clear(QUEUE_FLAG_DISABLE_WBT_DEF, bfqd->queue);
-	wbt_enable_default(bfqd->queue->disk);
+	set_bit(ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT, &e->flags);
 
 	kfree(bfqd);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 741e607dfab6..01e0ead13278 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -560,7 +560,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page)
 	ssize_t ret;
 	struct request_queue *q = disk->queue;
 
-	mutex_lock(&q->elevator_lock);
+	mutex_lock(&disk->rqos_state_mutex);
 	if (!wbt_rq_qos(q)) {
 		ret = -EINVAL;
 		goto out;
@@ -573,7 +573,7 @@ static ssize_t queue_wb_lat_show(struct gendisk *disk, char *page)
 
 	ret = sysfs_emit(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000));
 out:
-	mutex_unlock(&q->elevator_lock);
+	mutex_unlock(&disk->rqos_state_mutex);
 	return ret;
 }
 
@@ -593,7 +593,6 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 		return -EINVAL;
 
 	memflags = blk_mq_freeze_queue(q);
-	mutex_lock(&q->elevator_lock);
 
 	rqos = wbt_rq_qos(q);
 	if (!rqos) {
@@ -618,11 +617,12 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 	 */
 	blk_mq_quiesce_queue(q);
 
+	mutex_lock(&disk->rqos_state_mutex);
 	wbt_set_min_lat(q, val);
+	mutex_unlock(&disk->rqos_state_mutex);
 
 	blk_mq_unquiesce_queue(q);
 out:
-	mutex_unlock(&q->elevator_lock);
 	blk_mq_unfreeze_queue(q, memflags);
 
 	return ret;
@@ -871,9 +871,7 @@ int blk_register_queue(struct gendisk *disk)
 
 	if (queue_is_mq(q))
 		elevator_set_default(q);
-	mutex_lock(&q->elevator_lock);
 	wbt_enable_default(disk);
-	mutex_unlock(&q->elevator_lock);
 
 	blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
 
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 29cd2e33666f..74ae7131ada9 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -704,6 +704,8 @@ void wbt_enable_default(struct gendisk *disk)
 	struct rq_qos *rqos;
 	bool enable = IS_ENABLED(CONFIG_BLK_WBT_MQ);
 
+	mutex_lock(&disk->rqos_state_mutex);
+
 	if (blk_queue_disable_wbt(q))
 		enable = false;
 
@@ -712,8 +714,10 @@ void wbt_enable_default(struct gendisk *disk)
 	if (rqos) {
 		if (enable && RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
 			RQWB(rqos)->enable_state = WBT_STATE_ON_DEFAULT;
+		mutex_unlock(&disk->rqos_state_mutex);
 		return;
 	}
+	mutex_unlock(&disk->rqos_state_mutex);
 
 	/* Queue not registered? Maybe shutting down... */
 	if (!blk_queue_registered(q))
@@ -773,11 +777,13 @@ void wbt_disable_default(struct gendisk *disk)
 	struct rq_wb *rwb;
 	if (!rqos)
 		return;
+	mutex_lock(&disk->rqos_state_mutex);
 	rwb = RQWB(rqos);
 	if (rwb->enable_state == WBT_STATE_ON_DEFAULT) {
 		blk_stat_deactivate(rwb->cb);
 		rwb->enable_state = WBT_STATE_OFF_DEFAULT;
 	}
+	mutex_unlock(&disk->rqos_state_mutex);
 }
 EXPORT_SYMBOL_GPL(wbt_disable_default);
 
diff --git a/block/elevator.c b/block/elevator.c
index 8578b969e173..f8d72bd20610 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -637,8 +637,13 @@ static int elevator_change_done(struct request_queue *q,
 	int ret = 0;
 
 	if (ctx->old) {
+		bool enable_wbt = test_bit(ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT,
+				&ctx->old->flags);
+
 		elv_unregister_queue(q, ctx->old);
 		kobject_put(&ctx->old->kobj);
+		if (enable_wbt)
+			wbt_enable_default(q->disk);
 	}
 	if (ctx->new) {
 		ret = elv_register_queue(q, ctx->new, !ctx->no_uevent);
diff --git a/block/elevator.h b/block/elevator.h
index 76a90a1b7ed6..a07ce773a38f 100644
--- a/block/elevator.h
+++ b/block/elevator.h
@@ -122,6 +122,7 @@ struct elevator_queue
 
 #define ELEVATOR_FLAG_REGISTERED	0
 #define ELEVATOR_FLAG_DYING		1
+#define ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT	2
 
 /*
  * block elevator interface
diff --git a/block/genhd.c b/block/genhd.c
index a8cb5607b6e3..9c7c657380db 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1470,6 +1470,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
 #ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
 	INIT_LIST_HEAD(&disk->slave_bdevs);
 #endif
+	mutex_init(&disk->rqos_state_mutex);
 	return disk;
 
 out_erase_part0:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b15c53fabe9f..c19ae1877061 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -218,6 +218,8 @@ struct gendisk {
 	 * devices that do not have multiple independent access ranges.
 	 */
 	struct blk_independent_access_ranges *ia_ranges;
+
+	struct mutex rqos_state_mutex;	/* rqos state change mutex */
 };
 
 /**
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 07/25] block: add helper add_disk_final()
  2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
@ 2025-05-06  4:40   ` Christoph Hellwig
  2025-05-06  7:43   ` Nilay Shroff
  2025-05-06 11:02   ` Hannes Reinecke
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress
  2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
@ 2025-05-06  4:41   ` Christoph Hellwig
  2025-05-06  6:26   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store()
  2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
@ 2025-05-06  4:41   ` Christoph Hellwig
  2025-05-06  7:47   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

On Mon, May 05, 2025 at 10:17:50PM +0800, Ming Lei wrote:
> Move blk_queue_registered() check into elv_iosched_store() and prepare
> for using elevator_change() for covering any kind of elevator change in
> adding/deleting disk and updating nr_hw_queue.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

(although I suspect we don't need the check at all, but that can be
dealt with later)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change
  2025-05-05 14:17 ` [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
@ 2025-05-06  4:42   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock
  2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
@ 2025-05-06  4:43   ` Christoph Hellwig
  2025-05-06  6:36   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs
  2025-05-05 14:18 ` [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs Ming Lei
@ 2025-05-06  4:44   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing
  2025-05-05 14:18 ` [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing Ming Lei
@ 2025-05-06  4:44   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit()
  2025-05-05 14:18 ` [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
@ 2025-05-06  4:44   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2025-05-06  4:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Nilay Shroff, Shinichiro Kawasaki,
	Thomas Hellström, Christoph Hellwig, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress
  2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
  2025-05-06  4:41   ` Christoph Hellwig
@ 2025-05-06  6:26   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Nilay Shroff @ 2025-05-06  6:26 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Shinichiro Kawasaki, Thomas Hellström, Christoph Hellwig,
	Hannes Reinecke



On 5/5/25 7:47 PM, Ming Lei wrote:
> Elevator switch code is another `nr_hw_queue` reader in non-fast-IO code
> path, so it can't be done if updating `nr_hw_queues` is in-progress.
> 
> Take same approach with not allowing add/del disk when updating
> nr_hw_queues is in-progress, by grabbing read lock of
> set->update_nr_hwq_sema.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Link: https://lore.kernel.org/linux-block/aAWv3NPtNIKKvJZc@fedora/ [1]
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Closes: https://lore.kernel.org/linux-block/mz4t4tlwiqjijw3zvqnjb7ovvvaegkqganegmmlc567tt5xj67@xal5ro544cnc/
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk
  2025-05-05 14:17 ` [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk Ming Lei
@ 2025-05-06  6:32   ` Nilay Shroff
  0 siblings, 0 replies; 42+ messages in thread
From: Nilay Shroff @ 2025-05-06  6:32 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Shinichiro Kawasaki, Thomas Hellström, Christoph Hellwig,
	Hannes Reinecke



On 5/5/25 7:47 PM, Ming Lei wrote:
> Add new helper disable_elv_switch() and new flag QUEUE_FLAG_NO_ELV_SWITCH
> for disabling elevator switch before deleting disk:
> 
> - originally flag QUEUE_FLAG_REGISTERED is added for preventing elevator
> switch during removing disk, but this flag has been used widely for
> other purposes, so add one new flag for disabling elevator switch only
> 
> - for avoiding deadlock risk, we have to move elevator queue
> register/unregister out of elevator lock and queue freeze, which will be
> done in next patch. However, this way adds small race window between elevator
> switch and deleting ->queue_kobj, in which elevator queue register/unregister
> could be run concurrently. The added helper will be used for avoiding the race
> in the following patch.
> 
> - drain in-progress elevator switch before deleting disk
> 
> Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock
  2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
  2025-05-06  4:43   ` Christoph Hellwig
@ 2025-05-06  6:36   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Nilay Shroff @ 2025-05-06  6:36 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Shinichiro Kawasaki, Thomas Hellström, Christoph Hellwig,
	Hannes Reinecke



On 5/5/25 7:47 PM, Ming Lei wrote:
> Move elv_register[unregister]_queue out of ->elevator_lock & queue freezing,
> so we can kill many lockdep warnings.
> 
> elv_register[unregister]_queue() is serialized, and just dealing with sysfs/
> debugfs things, no need to be done with queue frozen:
> 
> - when it is called from adding disk, elevator switch isn't possible
>   because ->queue_kobj isn't added yet
> 
> - when it is called from deleting disk, disable_elv_switch() is
>   responsible for preventing new elevator switch and draining old
>   elevator switch.
> 
> - when it is called from blk_mq_update_nr_hw_queues(), adding/removing
>   disk and elevator switch can't be allowed or in-progress
> 
> With this change, elevator's ->exit() is called before calling
> elv_unregister_queue, then user may call into ->show()/store() of elevator's
> sysfs attributes, and we have covered this issue by adding `ELEVATOR_FLAG_DYNG`.
> 
> For blk-mq debugfs, hctx->sched_tags is always checked with ->elevator_lock by
> debugfs code, meantime hctx->sched_tags is updated with ->elevator_lock, so
> there isn't such issue.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good to me:
Reviewed-by : Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 07/25] block: add helper add_disk_final()
  2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
  2025-05-06  4:40   ` Christoph Hellwig
@ 2025-05-06  7:43   ` Nilay Shroff
  2025-05-06 11:02   ` Hannes Reinecke
  2 siblings, 0 replies; 42+ messages in thread
From: Nilay Shroff @ 2025-05-06  7:43 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Shinichiro Kawasaki, Thomas Hellström, Christoph Hellwig



On 5/5/25 7:47 PM, Ming Lei wrote:
> Add helper add_disk_final() for scanning partitions, announcing disk and
> handling the last thing for adding disk.
> 
> No functional change, and prepare for prevent adding disk from happening
> when updating nr_hw_queues.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store()
  2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
  2025-05-06  4:41   ` Christoph Hellwig
@ 2025-05-06  7:47   ` Nilay Shroff
  1 sibling, 0 replies; 42+ messages in thread
From: Nilay Shroff @ 2025-05-06  7:47 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Shinichiro Kawasaki, Thomas Hellström, Christoph Hellwig,
	Hannes Reinecke



On 5/5/25 7:47 PM, Ming Lei wrote:
> Move blk_queue_registered() check into elv_iosched_store() and prepare
> for using elevator_change() for covering any kind of elevator change in
> adding/deleting disk and updating nr_hw_queue.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 07/25] block: add helper add_disk_final()
  2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
  2025-05-06  4:40   ` Christoph Hellwig
  2025-05-06  7:43   ` Nilay Shroff
@ 2025-05-06 11:02   ` Hannes Reinecke
  2 siblings, 0 replies; 42+ messages in thread
From: Hannes Reinecke @ 2025-05-06 11:02 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig

On 5/5/25 16:17, Ming Lei wrote:
> Add helper add_disk_final() for scanning partitions, announcing disk and
> handling the last thing for adding disk.
> 
> No functional change, and prepare for prevent adding disk from happening
> when updating nr_hw_queues.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/genhd.c | 46 +++++++++++++++++++++++++++-------------------
>   1 file changed, 27 insertions(+), 19 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying
  2025-05-05 14:17 ` [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
@ 2025-05-06 11:09   ` Hannes Reinecke
  0 siblings, 0 replies; 42+ messages in thread
From: Hannes Reinecke @ 2025-05-06 11:09 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig

On 5/5/25 16:17, Ming Lei wrote:
> Prepare for moving elv_register[unregister]_queue out of elevator_lock
> & queue freezing, so we may have to call elv_unregister_queue() after
> elevator ->exit() is called, then there is small window for user to
> call into ->show()/store(), and user-after-free can be caused.
> 
> Fail to show/store elevator sysfs attribute if elevator is dying by
> adding one new flag of ELEVATOR_FLAG_DYNG, which is protected by
> elevator ->sysfs_lock.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq-sched.c |  1 +
>   block/elevator.c     | 10 ++++++----
>   block/elevator.h     |  1 +
>   3 files changed, 8 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning
  2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
                   ` (24 preceding siblings ...)
  2025-05-05 14:18 ` [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
@ 2025-05-06 13:48 ` Jens Axboe
  25 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2025-05-06 13:48 UTC (permalink / raw)
  To: linux-block, Ming Lei
  Cc: Nilay Shroff, Shinichiro Kawasaki, Thomas Hellström,
	Christoph Hellwig


On Mon, 05 May 2025 22:17:38 +0800, Ming Lei wrote:
> This patchset cleans up elevator change code, and unifying it via single
> helper, meantime moves kobject_add/del & debugfs register/unregister out of
> queue freezing & elevator_lock. This way fixes many lockdep warnings
> reported recently, especially since fs_reclaim is connected with freeze lock
> manually by commit ffa1e7ada456 ("block: Make request_queue lockdep splats
> show up earlier").
> 
> [...]

Applied, thanks!

[01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue()
        commit: f24d47edd1119b162a986bf1e88f30ec88c28029
[02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag
        commit: 56dee46ff47f0ef9944dddd1fd137c94b7c2d9de
[03/25] block: don't call freeze queue in elevator_switch() and elevator_disable()
        commit: f8e111c859b92ee909f1676f90c791e7165d3860
[04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show()
        commit: 94209d27d14104ed828ca88cd5403a99162fe51a
[05/25] block: add two helpers for registering/un-registering sched debugfs
        commit: ed3896acdcf038888a80a02dd264099e35f76b47
[06/25] block: move sched debugfs register into elvevator_register_queue
        commit: 92c22d7efcdf92412ff70eb175d424d9c24ac07f
[07/25] block: add helper add_disk_final()
        commit: 5fad1490ef510e3b70ad8b0a5a1e28a26638a95f
[08/25] block: prevent adding/deleting disk during updating nr_hw_queues
        commit: 98e68f67020ce30e1a4d8e2d05d85a453738dfb8
[09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress
        commit: b126d9d7475e3a35155f31418e54d9221b971ca1
[10/25] block: look up the elevator type in elevator_switch
        commit: a11abb98388e23188f3915780f3a193fdc1e4ff0
[11/25] block: fold elevator_disable into elevator_switch
        commit: 1bb7fba0e262e71f9355dc46c10b9da3c92f3d2b
[12/25] block: move blk_queue_registered() check into elv_iosched_store()
        commit: ac55b71a31a7287342e622c6f4de201e54b1c195
[13/25] block: simplify elevator reattachment for updating nr_hw_queues
        commit: 596dce110b7d543db727e6957ae7adf35beb0633
[14/25] block: move queue freezing & elevator_lock into elevator_change()
        commit: 20117b5a4b9c6dbb9414f0451111c3f13a37874a
[15/25] block: add `struct elv_change_ctx` for unifying elevator change
        commit: 1e9db5c42730e9ffd32cb922775de4873ec1d8ee
[16/25] block: unifying elevator change
        commit: 1e44bedbc921a35cb847991953814a50f738bcf3
[17/25] block: pass elevator_queue to elv_register_queue & unregister_queue
        commit: a3dc6279c2d5e2653b198684eb8857f414b6768f
[18/25] block: remove elevator queue's type check in elv_attr_show/store()
        commit: e25ee50dfab9fce77d2e0d89d2413b6c68015f97
[19/25] block: fail to show/store elevator sysfs attribute if elevator is dying
        commit: 5c3d858cdc57196e6d438e5ad47a732216e81a9c
[20/25] block: add new helper for disabling elevator switch when deleting disk
        commit: 21eed794ab4bd1a6c82a55df4416d18fb4d21da9
[21/25] block: move elv_register[unregister]_queue out of elevator_lock
        commit: 559dc11143eb468b2099b403d3a8d5c7fce32b96
[22/25] block: move hctx debugfs/sysfs registering out of freezing queue
        commit: 9dc7a882ce96482bfff3dba51dadcbe68daeac6c
[23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs
        commit: 0a47d2b433ad275236d625b9f09c6d3672329712
[24/25] block: move hctx cpuhp add/del out of queue freezing
        commit: 7ed7fa561c357d1ff0d5938446662b2ea4b26bb3
[25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit()
        commit: 78c271344b6f64ce24c845e54903e09928cf2061

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2025-05-06 13:48 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-05 14:17 [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Ming Lei
2025-05-05 14:17 ` [PATCH V5 01/25] block: move blk_mq_add_queue_tag_set() after blk_mq_map_swqueue() Ming Lei
2025-05-05 14:17 ` [PATCH V5 02/25] block: move ELEVATOR_FLAG_DISABLE_WBT a request queue flag Ming Lei
2025-05-05 14:17 ` [PATCH V5 03/25] block: don't call freeze queue in elevator_switch() and elevator_disable() Ming Lei
2025-05-05 14:17 ` [PATCH V5 04/25] block: use q->elevator with ->elevator_lock held in elv_iosched_show() Ming Lei
2025-05-05 14:17 ` [PATCH V5 05/25] block: add two helpers for registering/un-registering sched debugfs Ming Lei
2025-05-05 14:17 ` [PATCH V5 06/25] block: move sched debugfs register into elvevator_register_queue Ming Lei
2025-05-05 14:17 ` [PATCH V5 07/25] block: add helper add_disk_final() Ming Lei
2025-05-06  4:40   ` Christoph Hellwig
2025-05-06  7:43   ` Nilay Shroff
2025-05-06 11:02   ` Hannes Reinecke
2025-05-05 14:17 ` [PATCH V5 08/25] block: prevent adding/deleting disk during updating nr_hw_queues Ming Lei
2025-05-05 14:17 ` [PATCH V5 09/25] block: don't allow to switch elevator if updating nr_hw_queues is in-progress Ming Lei
2025-05-06  4:41   ` Christoph Hellwig
2025-05-06  6:26   ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 10/25] block: look up the elevator type in elevator_switch Ming Lei
2025-05-05 14:17 ` [PATCH V5 11/25] block: fold elevator_disable into elevator_switch Ming Lei
2025-05-05 14:17 ` [PATCH V5 12/25] block: move blk_queue_registered() check into elv_iosched_store() Ming Lei
2025-05-06  4:41   ` Christoph Hellwig
2025-05-06  7:47   ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 13/25] block: simplify elevator reattachment for updating nr_hw_queues Ming Lei
2025-05-05 14:17 ` [PATCH V5 14/25] block: move queue freezing & elevator_lock into elevator_change() Ming Lei
2025-05-05 14:17 ` [PATCH V5 15/25] block: add `struct elv_change_ctx` for unifying elevator change Ming Lei
2025-05-06  4:42   ` Christoph Hellwig
2025-05-05 14:17 ` [PATCH V5 16/25] block: " Ming Lei
2025-05-05 14:17 ` [PATCH V5 17/25] block: pass elevator_queue to elv_register_queue & unregister_queue Ming Lei
2025-05-05 14:17 ` [PATCH V5 18/25] block: remove elevator queue's type check in elv_attr_show/store() Ming Lei
2025-05-05 14:17 ` [PATCH V5 19/25] block: fail to show/store elevator sysfs attribute if elevator is dying Ming Lei
2025-05-06 11:09   ` Hannes Reinecke
2025-05-05 14:17 ` [PATCH V5 20/25] block: add new helper for disabling elevator switch when deleting disk Ming Lei
2025-05-06  6:32   ` Nilay Shroff
2025-05-05 14:17 ` [PATCH V5 21/25] block: move elv_register[unregister]_queue out of elevator_lock Ming Lei
2025-05-06  4:43   ` Christoph Hellwig
2025-05-06  6:36   ` Nilay Shroff
2025-05-05 14:18 ` [PATCH V5 22/25] block: move hctx debugfs/sysfs registering out of freezing queue Ming Lei
2025-05-05 14:18 ` [PATCH V5 23/25] block: don't acquire ->elevator_lock in blk_mq_map_swqueue and blk_mq_realloc_hw_ctxs Ming Lei
2025-05-06  4:44   ` Christoph Hellwig
2025-05-05 14:18 ` [PATCH V5 24/25] block: move hctx cpuhp add/del out of queue freezing Ming Lei
2025-05-06  4:44   ` Christoph Hellwig
2025-05-05 14:18 ` [PATCH V5 25/25] block: move wbt_enable_default() out of queue freezing from sched ->exit() Ming Lei
2025-05-06  4:44   ` Christoph Hellwig
2025-05-06 13:48 ` [PATCH V5 00/25] block: unify elevator changing and fix lockdep warning Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox