linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] blk-mq: fix possible deadlocks
@ 2025-11-21  6:28 Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 1/9] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

changes in v2:
 - combine two set into one;

Yu Kuai (9):
  blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
  blk-rq-qos: fix possible debugfs_mutex deadlock
  blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  blk-mq-debugfs: warn about possible deadlock
  block/blk-rq-qos: add a new helper rq_qos_add_frozen()
  blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
  blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
  blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze
    queue
  block/blk-rq-qos: cleanup rq_qos_add()

 block/blk-cgroup.c     | 38 ++++++++++++++++++++-------
 block/blk-iocost.c     | 15 +++++------
 block/blk-iolatency.c  | 11 +++++---
 block/blk-mq-debugfs.c | 58 ++++++++++++++++++++++++++++++------------
 block/blk-mq-debugfs.h |  4 +--
 block/blk-rq-qos.c     | 27 ++++----------------
 block/blk-sysfs.c      |  4 +++
 block/blk-wbt.c        | 10 +++++++-
 8 files changed, 105 insertions(+), 62 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/9] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

There is already a helper blk_mq_debugfs_register_rqos() to register
one rqos, however this helper is called synchronously when the rqos is
created with queue frozen.

Prepare to fix possible deadlock to create blk-mq debugfs entries while
queue is still frozen.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-mq-debugfs.c | 23 +++++++++++++++--------
 block/blk-mq-debugfs.h |  5 +++++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4896525b1c05..128d2aa6a20d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -619,6 +619,20 @@ static void debugfs_create_files(struct dentry *parent, void *data,
 				    (void *)attr, data, &blk_mq_debugfs_fops);
 }
 
+void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
+{
+	lockdep_assert_held(&q->debugfs_mutex);
+
+	if (q->rq_qos) {
+		struct rq_qos *rqos = q->rq_qos;
+
+		while (rqos) {
+			blk_mq_debugfs_register_rqos(rqos);
+			rqos = rqos->next;
+		}
+	}
+}
+
 void blk_mq_debugfs_register(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -631,14 +645,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
 			blk_mq_debugfs_register_hctx(q, hctx);
 	}
 
-	if (q->rq_qos) {
-		struct rq_qos *rqos = q->rq_qos;
-
-		while (rqos) {
-			blk_mq_debugfs_register_rqos(rqos);
-			rqos = rqos->next;
-		}
-	}
+	blk_mq_debugfs_register_rq_qos(q);
 }
 
 static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index c80e453e3014..54948a266889 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -33,6 +33,7 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
 				       struct blk_mq_hw_ctx *hctx);
 void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
 
+void blk_mq_debugfs_register_rq_qos(struct request_queue *q);
 void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
 void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
 #else
@@ -78,6 +79,10 @@ static inline void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
 {
 }
 
+static inline void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
+{
+}
+
 static inline void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
 {
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 1/9] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21 10:40   ` Nilay Shroff
  2025-11-21  6:28 ` [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Currently rq-qos debugfs entries are created from rq_qos_add(), while
rq_qos_add() can be called while queue is still frozen. This can
deadlock because creating new entries can trigger fs reclaim.

Fix this problem by delaying creating rq-qos debugfs entries after queue
is unfrozen.

- For wbt, 1) it can be initialized by default, fix it by calling new
  helper after wbt_init() from wbt_enable_default; 2) it can be
  initialized by sysfs, fix it by calling new helper after queue is
  unfrozen from queue_wb_lat_store().
- For iocost and iolatency, they can only be initialized by blkcg
  configuration, fix by calling the new helper from blkg_conf_end().

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-cgroup.c | 38 +++++++++++++++++++++++++++++---------
 block/blk-rq-qos.c |  7 -------
 block/blk-sysfs.c  |  4 ++++
 block/blk-wbt.c    |  6 +++++-
 4 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3cffb68ba5d8..ad3bd43b2859 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -33,6 +33,7 @@
 #include "blk-cgroup.h"
 #include "blk-ioprio.h"
 #include "blk-throttle.h"
+#include "blk-mq-debugfs.h"
 
 static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu);
 
@@ -966,14 +967,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 }
 EXPORT_SYMBOL_GPL(blkg_conf_prep);
 
-/**
- * blkg_conf_exit - clean up per-blkg config update
- * @ctx: blkg_conf_ctx initialized with blkg_conf_init()
- *
- * Clean up after per-blkg config update. This function must be called on all
- * blkg_conf_ctx's initialized with blkg_conf_init().
- */
-void blkg_conf_exit(struct blkg_conf_ctx *ctx)
+static void __blkg_conf_exit(struct blkg_conf_ctx *ctx)
 	__releases(&ctx->bdev->bd_queue->queue_lock)
 	__releases(&ctx->bdev->bd_queue->rq_qos_mutex)
 {
@@ -986,6 +980,26 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx)
 		mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex);
 		blkdev_put_no_open(ctx->bdev);
 		ctx->body = NULL;
+	}
+}
+
+/**
+ * blkg_conf_exit - clean up per-blkg config update
+ * @ctx: blkg_conf_ctx initialized with blkg_conf_init()
+ *
+ * Clean up after per-blkg config update. This function must be called on all
+ * blkg_conf_ctx's initialized with blkg_conf_init().
+ */
+void blkg_conf_exit(struct blkg_conf_ctx *ctx)
+{
+	__blkg_conf_exit(ctx);
+	if (ctx->bdev) {
+		struct request_queue *q = ctx->bdev->bd_queue;
+
+		mutex_lock(&q->debugfs_mutex);
+		blk_mq_debugfs_register_rq_qos(q);
+		mutex_unlock(&q->debugfs_mutex);
+
 		ctx->bdev = NULL;
 	}
 }
@@ -1000,8 +1014,14 @@ void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memflags)
 	if (ctx->bdev) {
 		struct request_queue *q = ctx->bdev->bd_queue;
 
-		blkg_conf_exit(ctx);
+		__blkg_conf_exit(ctx);
 		blk_mq_unfreeze_queue(q, memflags);
+
+		mutex_lock(&q->debugfs_mutex);
+		blk_mq_debugfs_register_rq_qos(q);
+		mutex_unlock(&q->debugfs_mutex);
+
+		ctx->bdev = NULL;
 	}
 }
 
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 654478dfbc20..d7ce99ce2e80 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -347,13 +347,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
 
 	blk_mq_unfreeze_queue(q, memflags);
-
-	if (rqos->ops->debugfs_attrs) {
-		mutex_lock(&q->debugfs_mutex);
-		blk_mq_debugfs_register_rqos(rqos);
-		mutex_unlock(&q->debugfs_mutex);
-	}
-
 	return 0;
 ebusy:
 	blk_mq_unfreeze_queue(q, memflags);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5c2d29ac6570..fb165f2977e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -722,6 +722,10 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 out:
 	blk_mq_unfreeze_queue(q, memflags);
 
+	mutex_lock(&q->debugfs_mutex);
+	blk_mq_debugfs_register_rq_qos(q);
+	mutex_unlock(&q->debugfs_mutex);
+
 	return ret;
 }
 
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index eb8037bae0bd..b1ab0f297f24 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -724,8 +724,12 @@ void wbt_enable_default(struct gendisk *disk)
 	if (!blk_queue_registered(q))
 		return;
 
-	if (queue_is_mq(q) && enable)
+	if (queue_is_mq(q) && enable) {
 		wbt_init(disk);
+		mutex_lock(&q->debugfs_mutex);
+		blk_mq_debugfs_register_rq_qos(q);
+		mutex_unlock(&q->debugfs_mutex);
+	}
 }
 EXPORT_SYMBOL_GPL(wbt_enable_default);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 1/9] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21 10:46   ` Nilay Shroff
  2025-11-21  6:28 ` [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock Yu Kuai
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Because it's only used inside blk-mq-debugfs.c now.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-mq-debugfs.c | 4 +++-
 block/blk-mq-debugfs.h | 5 -----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 128d2aa6a20d..99466595c0a4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -14,6 +14,8 @@
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
 
+static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
+
 static int queue_poll_stat_show(void *data, struct seq_file *m)
 {
 	return 0;
@@ -758,7 +760,7 @@ void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
 	rqos->debugfs_dir = NULL;
 }
 
-void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
+static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
 {
 	struct request_queue *q = rqos->disk->queue;
 	const char *dir_name = rq_qos_id_to_name(rqos->id);
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index 54948a266889..d94daa66556b 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -34,7 +34,6 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
 void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_debugfs_register_rq_qos(struct request_queue *q);
-void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
 void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
 #else
 static inline void blk_mq_debugfs_register(struct request_queue *q)
@@ -75,10 +74,6 @@ static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hc
 {
 }
 
-static inline void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
-{
-}
-
 static inline void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
 {
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (2 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21 23:32   ` kernel test robot
  2025-11-22  2:50   ` kernel test robot
  2025-11-21  6:28 ` [PATCH v2 5/9] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Creating new debugfs entries can trigger fs reclaim, hence we can't do
this with queue frozen, meanwhile, other locks that can be held while
queue is frozen should not be held as well.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-mq-debugfs.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 99466595c0a4..39004603a670 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -610,9 +610,23 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
 	{},
 };
 
-static void debugfs_create_files(struct dentry *parent, void *data,
+static void debugfs_create_files(struct request_queue *q, struct dentry *parent,
+				 void *data,
 				 const struct blk_mq_debugfs_attr *attr)
 {
+	/*
+	 * Creating new debugfs entries with queue freezed has the risk of
+	 * deadlock.
+	 */
+	WARN_ON_ONCE(q->mq_freeze_depth != 0);
+	/*
+	 * debugfs_mutex should not be nested under other locks that can be
+	 * grabbed while queue is frozen.
+	 */
+	lockdep_assert_not_held(&q->elevator_lock);
+	lockdep_assert_not_held(&q->rq_qos_mutex);
+	lockdep_assert_not_held(&q->blkcg_mutex);
+
 	if (IS_ERR_OR_NULL(parent))
 		return;
 
@@ -640,7 +654,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
+	debugfs_create_files(q, q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (!hctx->debugfs_dir)
@@ -659,7 +673,8 @@ static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
 	snprintf(name, sizeof(name), "cpu%u", ctx->cpu);
 	ctx_dir = debugfs_create_dir(name, hctx->debugfs_dir);
 
-	debugfs_create_files(ctx_dir, ctx, blk_mq_debugfs_ctx_attrs);
+	debugfs_create_files(hctx->queue, ctx_dir, ctx,
+			     blk_mq_debugfs_ctx_attrs);
 }
 
 void blk_mq_debugfs_register_hctx(struct request_queue *q,
@@ -675,7 +690,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
 	snprintf(name, sizeof(name), "hctx%u", hctx->queue_num);
 	hctx->debugfs_dir = debugfs_create_dir(name, q->debugfs_dir);
 
-	debugfs_create_files(hctx->debugfs_dir, hctx, blk_mq_debugfs_hctx_attrs);
+	debugfs_create_files(q, hctx->debugfs_dir, hctx,
+			     blk_mq_debugfs_hctx_attrs);
 
 	hctx_for_each_ctx(hctx, ctx, i)
 		blk_mq_debugfs_register_ctx(hctx, ctx);
@@ -726,7 +742,7 @@ void blk_mq_debugfs_register_sched(struct request_queue *q)
 
 	q->sched_debugfs_dir = debugfs_create_dir("sched", q->debugfs_dir);
 
-	debugfs_create_files(q->sched_debugfs_dir, q, e->queue_debugfs_attrs);
+	debugfs_create_files(q, q->sched_debugfs_dir, q, e->queue_debugfs_attrs);
 }
 
 void blk_mq_debugfs_unregister_sched(struct request_queue *q)
@@ -775,7 +791,8 @@ static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
 							 q->debugfs_dir);
 
 	rqos->debugfs_dir = debugfs_create_dir(dir_name, q->rqos_debugfs_dir);
-	debugfs_create_files(rqos->debugfs_dir, rqos, rqos->ops->debugfs_attrs);
+	debugfs_create_files(q, rqos->debugfs_dir, rqos,
+			     rqos->ops->debugfs_attrs);
 }
 
 void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
@@ -798,7 +815,7 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
 
 	hctx->sched_debugfs_dir = debugfs_create_dir("sched",
 						     hctx->debugfs_dir);
-	debugfs_create_files(hctx->sched_debugfs_dir, hctx,
+	debugfs_create_files(q, hctx->sched_debugfs_dir, hctx,
 			     e->hctx_debugfs_attrs);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 5/9] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (3 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 6/9] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

queue should not be frozen under rq_qos_mutex, see example from
commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
parameters"), which means current implementation of rq_qos_add() is
problematic. Add a new helper and prepare to fix this problem in
following patches.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-rq-qos.c | 21 +++++++++++++++++++++
 block/blk-rq-qos.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index d7ce99ce2e80..c1a94c2a9742 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,6 +322,27 @@ void rq_qos_exit(struct request_queue *q)
 	mutex_unlock(&q->rq_qos_mutex);
 }
 
+int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
+		      enum rq_qos_id id, const struct rq_qos_ops *ops)
+{
+	struct request_queue *q = disk->queue;
+
+	WARN_ON_ONCE(q->mq_freeze_depth == 0);
+	lockdep_assert_held(&q->rq_qos_mutex);
+
+	if (rq_qos_id(q, id))
+		return -EBUSY;
+
+	rqos->disk = disk;
+	rqos->id = id;
+	rqos->ops = ops;
+	rqos->next = q->rq_qos;
+	q->rq_qos = rqos;
+	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
+
+	return 0;
+}
+
 int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 		const struct rq_qos_ops *ops)
 {
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index b538f2c0febc..8d9fb10ae526 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,6 +87,8 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
 
 int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 		const struct rq_qos_ops *ops);
+int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
+		      enum rq_qos_id id, const struct rq_qos_ops *ops);
 void rq_qos_del(struct rq_qos *rqos);
 
 typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 6/9] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (4 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 5/9] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 7/9] blk-iocost: " Yu Kuai
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

wbt_init() can be called from sysfs attribute and wbt_enable_default(),
however the lock order are inversely.

- queue_wb_lat_store() freeze queue first, and then wbt_init() hold
  rq_qos_mutex. In this case queue will be frozen again inside
  rq_qos_add(), however, in this case freeze queue recursively is
  inoperative;
- wbt_enable_default() from elevator switch will hold rq_qos_mutex
  first, and then rq_qos_add() will freeze queue;

Fix this problem by converting to use new helper rq_qos_add_frozen() in
wbt_init(), and for wbt_enable_default(), freeze queue before calling
wbt_init().

Fixes: a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-wbt.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index b1ab0f297f24..5e7e481103a1 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -725,7 +725,11 @@ void wbt_enable_default(struct gendisk *disk)
 		return;
 
 	if (queue_is_mq(q) && enable) {
+		unsigned int memflags = blk_mq_freeze_queue(q);
+
 		wbt_init(disk);
+		blk_mq_unfreeze_queue(q, memflags);
+
 		mutex_lock(&q->debugfs_mutex);
 		blk_mq_debugfs_register_rq_qos(q);
 		mutex_unlock(&q->debugfs_mutex);
@@ -926,7 +930,7 @@ int wbt_init(struct gendisk *disk)
 	 * Assign rwb and add the stats callback.
 	 */
 	mutex_lock(&q->rq_qos_mutex);
-	ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+	ret = rq_qos_add_frozen(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
 	mutex_unlock(&q->rq_qos_mutex);
 	if (ret)
 		goto err_free;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 7/9] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (5 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 6/9] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 8/9] blk-iolatency: " Yu Kuai
  2025-11-21  6:28 ` [PATCH v2 9/9] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
  8 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Like wbt, rq_qos_add() can be called from two path and the lock order
are inversely:

- From ioc_qos_write(), queue is already frozen before rq_qos_add();
- From ioc_cost_model_write(), rq_qos_add() is called directly;

Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from ioc_cost_model_write(), then since all rq_qos_add() callers
already freeze queue, convert to use rq_qos_add_frozen().

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-iocost.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 5bfd70311359..c375a7e19789 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2927,7 +2927,7 @@ static int blk_iocost_init(struct gendisk *disk)
 	 * called before policy activation completion, can't assume that the
 	 * target bio has an iocg associated and need to test for NULL iocg.
 	 */
-	ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+	ret = rq_qos_add_frozen(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
 	if (ret)
 		goto err_free_ioc;
 
@@ -3410,7 +3410,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 {
 	struct blkg_conf_ctx ctx;
 	struct request_queue *q;
-	unsigned int memflags;
+	unsigned long memflags;
 	struct ioc *ioc;
 	u64 u[NR_I_LCOEFS];
 	bool user;
@@ -3419,9 +3419,11 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 
 	blkg_conf_init(&ctx, input);
 
-	ret = blkg_conf_open_bdev(&ctx);
-	if (ret)
+	memflags = blkg_conf_open_bdev_frozen(&ctx);
+	if (IS_ERR_VALUE(memflags)) {
+		ret = memflags;
 		goto err;
+	}
 
 	body = ctx.body;
 	q = bdev_get_queue(ctx.bdev);
@@ -3438,7 +3440,6 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 		ioc = q_to_ioc(q);
 	}
 
-	memflags = blk_mq_freeze_queue(q);
 	blk_mq_quiesce_queue(q);
 
 	spin_lock_irq(&ioc->lock);
@@ -3490,20 +3491,18 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 	spin_unlock_irq(&ioc->lock);
 
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q, memflags);
 
-	blkg_conf_exit(&ctx);
+	blkg_conf_exit_frozen(&ctx, memflags);
 	return nbytes;
 
 einval:
 	spin_unlock_irq(&ioc->lock);
 
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q, memflags);
 
 	ret = -EINVAL;
 err:
-	blkg_conf_exit(&ctx);
+	blkg_conf_exit_frozen(&ctx, memflags);
 	return ret;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 8/9] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (6 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 7/9] blk-iocost: " Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  2025-11-21 10:53   ` Nilay Shroff
  2025-11-21  6:28 ` [PATCH v2 9/9] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
  8 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Currently blk-iolatency will hold rq_qos_mutex first and then call
rq_qos_add() to freeze queue.

Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from iolatency_set_limit(), and convert to use rq_qos_add_frozen().

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-iolatency.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 45bd18f68541..1558afbf517b 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
 	if (!blkiolat)
 		return -ENOMEM;
 
-	ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
-			 &blkcg_iolatency_ops);
+	ret = rq_qos_add_frozen(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+				&blkcg_iolatency_ops);
 	if (ret)
 		goto err_free;
 	ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
@@ -831,16 +831,19 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
 	struct blkcg_gq *blkg;
 	struct blkg_conf_ctx ctx;
 	struct iolatency_grp *iolat;
+	unsigned long memflags;
 	char *p, *tok;
 	u64 lat_val = 0;
 	u64 oldval;
-	int ret;
+	int ret = 0;
 
 	blkg_conf_init(&ctx, buf);
 
-	ret = blkg_conf_open_bdev(&ctx);
-	if (ret)
+	memflags = blkg_conf_open_bdev_frozen(&ctx);
+	if (IS_ERR_VALUE(memflags)) {
+		ret = memflags;
 		goto out;
+	}
 
 	/*
 	 * blk_iolatency_init() may fail after rq_qos_add() succeeds which can
@@ -890,7 +893,7 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
 		iolatency_clear_scaling(blkg);
 	ret = 0;
 out:
-	blkg_conf_exit(&ctx);
+	blkg_conf_exit_frozen(&ctx, memflags);
 	return ret ?: nbytes;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 9/9] block/blk-rq-qos: cleanup rq_qos_add()
  2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
                   ` (7 preceding siblings ...)
  2025-11-21  6:28 ` [PATCH v2 8/9] blk-iolatency: " Yu Kuai
@ 2025-11-21  6:28 ` Yu Kuai
  8 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21  6:28 UTC (permalink / raw)
  To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai

Now that there is no caller of rq_qos_add(), remove it, and also rename
rq_qos_add_frozen() back to rq_qos_add().

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 block/blk-iocost.c    |  2 +-
 block/blk-iolatency.c |  4 ++--
 block/blk-rq-qos.c    | 35 ++---------------------------------
 block/blk-rq-qos.h    |  2 --
 block/blk-wbt.c       |  2 +-
 5 files changed, 6 insertions(+), 39 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index c375a7e19789..0948f628386f 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2927,7 +2927,7 @@ static int blk_iocost_init(struct gendisk *disk)
 	 * called before policy activation completion, can't assume that the
 	 * target bio has an iocg associated and need to test for NULL iocg.
 	 */
-	ret = rq_qos_add_frozen(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+	ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
 	if (ret)
 		goto err_free_ioc;
 
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 1558afbf517b..5b18125e21c9 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
 	if (!blkiolat)
 		return -ENOMEM;
 
-	ret = rq_qos_add_frozen(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
-				&blkcg_iolatency_ops);
+	ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+			 &blkcg_iolatency_ops);
 	if (ret)
 		goto err_free;
 	ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index c1a94c2a9742..8de7dae3273e 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,8 +322,8 @@ void rq_qos_exit(struct request_queue *q)
 	mutex_unlock(&q->rq_qos_mutex);
 }
 
-int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
-		      enum rq_qos_id id, const struct rq_qos_ops *ops)
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
+	       const struct rq_qos_ops *ops)
 {
 	struct request_queue *q = disk->queue;
 
@@ -343,37 +343,6 @@ int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
 	return 0;
 }
 
-int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
-		const struct rq_qos_ops *ops)
-{
-	struct request_queue *q = disk->queue;
-	unsigned int memflags;
-
-	lockdep_assert_held(&q->rq_qos_mutex);
-
-	rqos->disk = disk;
-	rqos->id = id;
-	rqos->ops = ops;
-
-	/*
-	 * No IO can be in-flight when adding rqos, so freeze queue, which
-	 * is fine since we only support rq_qos for blk-mq queue.
-	 */
-	memflags = blk_mq_freeze_queue(q);
-
-	if (rq_qos_id(q, rqos->id))
-		goto ebusy;
-	rqos->next = q->rq_qos;
-	q->rq_qos = rqos;
-	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
-
-	blk_mq_unfreeze_queue(q, memflags);
-	return 0;
-ebusy:
-	blk_mq_unfreeze_queue(q, memflags);
-	return -EBUSY;
-}
-
 void rq_qos_del(struct rq_qos *rqos)
 {
 	struct request_queue *q = rqos->disk->queue;
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 8d9fb10ae526..b538f2c0febc 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,8 +87,6 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
 
 int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 		const struct rq_qos_ops *ops);
-int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
-		      enum rq_qos_id id, const struct rq_qos_ops *ops);
 void rq_qos_del(struct rq_qos *rqos);
 
 typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 5e7e481103a1..0f90d9a97ef4 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -930,7 +930,7 @@ int wbt_init(struct gendisk *disk)
 	 * Assign rwb and add the stats callback.
 	 */
 	mutex_lock(&q->rq_qos_mutex);
-	ret = rq_qos_add_frozen(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+	ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
 	mutex_unlock(&q->rq_qos_mutex);
 	if (ret)
 		goto err_free;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock
  2025-11-21  6:28 ` [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
@ 2025-11-21 10:40   ` Nilay Shroff
  2025-11-21 12:09     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Nilay Shroff @ 2025-11-21 10:40 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche



On 11/21/25 11:58 AM, Yu Kuai wrote:
> +void blkg_conf_exit(struct blkg_conf_ctx *ctx)
> +{
> +	__blkg_conf_exit(ctx);
> +	if (ctx->bdev) {
> +		struct request_queue *q = ctx->bdev->bd_queue;
> +
> +		mutex_lock(&q->debugfs_mutex);
> +		blk_mq_debugfs_register_rq_qos(q);
> +		mutex_unlock(&q->debugfs_mutex);
> +
>  		ctx->bdev = NULL;
>  	}
>  }

Why do we need to add debugfs register from blkg_conf_exit() here?
Does any caller using above API, really registers/adds q->rqos? I
see blk-iococst, blk-iolatency and wbt don't use this API.


> @@ -724,8 +724,12 @@ void wbt_enable_default(struct gendisk *disk)
>  	if (!blk_queue_registered(q))
>  		return;
>  
> -	if (queue_is_mq(q) && enable)
> +	if (queue_is_mq(q) && enable) {
>  		wbt_init(disk);
> +		mutex_lock(&q->debugfs_mutex);
> +		blk_mq_debugfs_register_rq_qos(q);
> +		mutex_unlock(&q->debugfs_mutex);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(wbt_enable_default);

I see we do have still one code path left in which the debugfs register 
could be invoked while queue remains freezed:

ioc_qos_write:
blkg_conf_open_bdev_frozen        => freezs the queue
  wbt_enable_default
    blk_mq_debugfs_register_rq_qos
      
Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  2025-11-21  6:28 ` [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
@ 2025-11-21 10:46   ` Nilay Shroff
  2025-11-21 12:10     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Nilay Shroff @ 2025-11-21 10:46 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche



On 11/21/25 11:58 AM, Yu Kuai wrote:
> Because it's only used inside blk-mq-debugfs.c now.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  block/blk-mq-debugfs.c | 4 +++-
>  block/blk-mq-debugfs.h | 5 -----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 128d2aa6a20d..99466595c0a4 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -14,6 +14,8 @@
>  #include "blk-mq-sched.h"
>  #include "blk-rq-qos.h"
>  
> +static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
> +
If you define blk_mq_debugfs_register_rqos() before it is invoked
by blk_mq_debugfs_register_rq_qos(), then this change becomes unnecessary
and can be avoided.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 8/9] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
  2025-11-21  6:28 ` [PATCH v2 8/9] blk-iolatency: " Yu Kuai
@ 2025-11-21 10:53   ` Nilay Shroff
  2025-11-21 12:13     ` Yu Kuai
  0 siblings, 1 reply; 19+ messages in thread
From: Nilay Shroff @ 2025-11-21 10:53 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche



On 11/21/25 11:58 AM, Yu Kuai wrote:
> Currently blk-iolatency will hold rq_qos_mutex first and then call
> rq_qos_add() to freeze queue.
> 
> Fix this problem by converting to use blkg_conf_open_bdev_frozen()
> from iolatency_set_limit(), and convert to use rq_qos_add_frozen().
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---

So with this patchset, we have fixed locking order of ->freeze_lock
and ->rq_qos_mutex for blk-iolatency, blk-iocost and wbt. But I still
find following code paths where we still need to fix the locking 
order:

tg_set_limit:
blkg_conf_open_bdev  => acquires ->rq_qos_mutex
  blk_throtl_init    => acquires ->freeze_lock

tg_set_conf:
blkg_conf_open_bdev  => acquires ->rq_qos_mutex
  blk_throtl_init    => acquires ->freeze_lock

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock
  2025-11-21 10:40   ` Nilay Shroff
@ 2025-11-21 12:09     ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21 12:09 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche, Yu Kuai

Hi,

在 2025/11/21 18:40, Nilay Shroff 写道:
>
> On 11/21/25 11:58 AM, Yu Kuai wrote:
>> +void blkg_conf_exit(struct blkg_conf_ctx *ctx)
>> +{
>> +	__blkg_conf_exit(ctx);
>> +	if (ctx->bdev) {
>> +		struct request_queue *q = ctx->bdev->bd_queue;
>> +
>> +		mutex_lock(&q->debugfs_mutex);
>> +		blk_mq_debugfs_register_rq_qos(q);
>> +		mutex_unlock(&q->debugfs_mutex);
>> +
>>   		ctx->bdev = NULL;
>>   	}
>>   }
> Why do we need to add debugfs register from blkg_conf_exit() here?
> Does any caller using above API, really registers/adds q->rqos? I
> see blk-iococst, blk-iolatency and wbt don't use this API.

Yes, this sounds reasonable. We should add this when we really have
such rq qos policy.

>
>> @@ -724,8 +724,12 @@ void wbt_enable_default(struct gendisk *disk)
>>   	if (!blk_queue_registered(q))
>>   		return;
>>   
>> -	if (queue_is_mq(q) && enable)
>> +	if (queue_is_mq(q) && enable) {
>>   		wbt_init(disk);
>> +		mutex_lock(&q->debugfs_mutex);
>> +		blk_mq_debugfs_register_rq_qos(q);
>> +		mutex_unlock(&q->debugfs_mutex);
>> +	}
>>   }
>>   EXPORT_SYMBOL_GPL(wbt_enable_default);
> I see we do have still one code path left in which the debugfs register
> could be invoked while queue remains freezed:
>
> ioc_qos_write:
> blkg_conf_open_bdev_frozen        => freezs the queue
>    wbt_enable_default
>      blk_mq_debugfs_register_rq_qos
>        
> Thanks,
> --Nilay
>
>

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  2025-11-21 10:46   ` Nilay Shroff
@ 2025-11-21 12:10     ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21 12:10 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche, Yu Kuai

Hi,

在 2025/11/21 18:46, Nilay Shroff 写道:
>
> On 11/21/25 11:58 AM, Yu Kuai wrote:
>> Because it's only used inside blk-mq-debugfs.c now.
>>
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> ---
>>   block/blk-mq-debugfs.c | 4 +++-
>>   block/blk-mq-debugfs.h | 5 -----
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>> index 128d2aa6a20d..99466595c0a4 100644
>> --- a/block/blk-mq-debugfs.c
>> +++ b/block/blk-mq-debugfs.c
>> @@ -14,6 +14,8 @@
>>   #include "blk-mq-sched.h"
>>   #include "blk-rq-qos.h"
>>   
>> +static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
>> +
> If you define blk_mq_debugfs_register_rqos() before it is invoked
> by blk_mq_debugfs_register_rq_qos(), then this change becomes unnecessary
> and can be avoided.

In fact I do this at first, however there will be compile error because
there are still others symbols involved.

>
> Thanks,
> --Nilay
>

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 8/9] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
  2025-11-21 10:53   ` Nilay Shroff
@ 2025-11-21 12:13     ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-21 12:13 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: Jens Axboe, linux-block@vger.kernel.org, Tejun Heo, Ming Lei,
	Bart Van Assche, Yu Kuai

Hi,

在 2025/11/21 18:53, Nilay Shroff 写道:
>
> On 11/21/25 11:58 AM, Yu Kuai wrote:
>> Currently blk-iolatency will hold rq_qos_mutex first and then call
>> rq_qos_add() to freeze queue.
>>
>> Fix this problem by converting to use blkg_conf_open_bdev_frozen()
>> from iolatency_set_limit(), and convert to use rq_qos_add_frozen().
>>
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> ---
> So with this patchset, we have fixed locking order of ->freeze_lock
> and ->rq_qos_mutex for blk-iolatency, blk-iocost and wbt. But I still
> find following code paths where we still need to fix the locking
> order:
>
> tg_set_limit:
> blkg_conf_open_bdev  => acquires ->rq_qos_mutex
>    blk_throtl_init    => acquires ->freeze_lock
>
> tg_set_conf:
> blkg_conf_open_bdev  => acquires ->rq_qos_mutex
>    blk_throtl_init    => acquires ->freeze_lock

I think the freeze queue can be removed directly from blk_throtl_init(),
blk-throttle is bio based, and freeze queue can't stop new IO to blk-throttle.

>
> Thanks,
> --Nilay
>

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
  2025-11-21  6:28 ` [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock Yu Kuai
@ 2025-11-21 23:32   ` kernel test robot
  2025-11-22  2:50   ` kernel test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2025-11-21 23:32 UTC (permalink / raw)
  To: Yu Kuai, axboe, linux-block, tj, nilay, ming.lei, bvanassche
  Cc: oe-kbuild-all, yukuai

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe/for-next]
[also build test ERROR on linus/master v6.18-rc6 next-20251121]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/blk-mq-debugfs-factor-out-a-helper-to-register-debugfs-for-all-rq_qos/20251121-143315
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
patch link:    https://lore.kernel.org/r/20251121062829.1433332-5-yukuai%40fnnas.com
patch subject: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
config: parisc-randconfig-002-20251122 (https://download.01.org/0day-ci/archive/20251122/202511220719.yaFySU2X-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 14.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251122/202511220719.yaFySU2X-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511220719.yaFySU2X-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/bug.h:5,
                    from include/linux/vfsdebug.h:5,
                    from include/linux/fs.h:5,
                    from include/linux/highmem.h:5,
                    from include/linux/bvec.h:10,
                    from include/linux/blk_types.h:10,
                    from include/linux/blkdev.h:9,
                    from block/blk-mq-debugfs.c:7:
   block/blk-mq-debugfs.c: In function 'debugfs_create_files':
>> block/blk-mq-debugfs.c:628:35: error: 'struct request_queue' has no member named 'blkcg_mutex'
     628 |         lockdep_assert_not_held(&q->blkcg_mutex);
         |                                   ^~
   arch/parisc/include/asm/bug.h:86:32: note: in definition of macro 'WARN_ON'
      86 |         int __ret_warn_on = !!(x);                              \
         |                                ^
   include/linux/lockdep.h:288:9: note: in expansion of macro 'lockdep_assert'
     288 |         lockdep_assert(lockdep_is_held(l) != LOCK_STATE_HELD)
         |         ^~~~~~~~~~~~~~
   include/linux/lockdep.h:288:24: note: in expansion of macro 'lockdep_is_held'
     288 |         lockdep_assert(lockdep_is_held(l) != LOCK_STATE_HELD)
         |                        ^~~~~~~~~~~~~~~
   block/blk-mq-debugfs.c:628:9: note: in expansion of macro 'lockdep_assert_not_held'
     628 |         lockdep_assert_not_held(&q->blkcg_mutex);
         |         ^~~~~~~~~~~~~~~~~~~~~~~

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for OF_GPIO
   Depends on [n]: GPIOLIB [=y] && OF [=n] && HAS_IOMEM [=y]
   Selected by [y]:
   - GPIO_TB10X [=y] && GPIOLIB [=y] && HAS_IOMEM [=y] && (ARC_PLAT_TB10X || COMPILE_TEST [=y])
   WARNING: unmet direct dependencies detected for GPIO_SYSCON
   Depends on [n]: GPIOLIB [=y] && HAS_IOMEM [=y] && MFD_SYSCON [=y] && OF [=n]
   Selected by [y]:
   - GPIO_SAMA5D2_PIOBU [=y] && GPIOLIB [=y] && HAS_IOMEM [=y] && MFD_SYSCON [=y] && OF_GPIO [=y] && (ARCH_AT91 || COMPILE_TEST [=y])
   WARNING: unmet direct dependencies detected for I2C_K1
   Depends on [n]: I2C [=y] && HAS_IOMEM [=y] && (ARCH_SPACEMIT || COMPILE_TEST [=y]) && OF [=n]
   Selected by [y]:
   - MFD_SPACEMIT_P1 [=y] && HAS_IOMEM [=y] && (ARCH_SPACEMIT || COMPILE_TEST [=y]) && I2C [=y]


vim +628 block/blk-mq-debugfs.c

   612	
   613	static void debugfs_create_files(struct request_queue *q, struct dentry *parent,
   614					 void *data,
   615					 const struct blk_mq_debugfs_attr *attr)
   616	{
   617		/*
   618		 * Creating new debugfs entries with queue freezed has the risk of
   619		 * deadlock.
   620		 */
   621		WARN_ON_ONCE(q->mq_freeze_depth != 0);
   622		/*
   623		 * debugfs_mutex should not be nested under other locks that can be
   624		 * grabbed while queue is frozen.
   625		 */
   626		lockdep_assert_not_held(&q->elevator_lock);
   627		lockdep_assert_not_held(&q->rq_qos_mutex);
 > 628		lockdep_assert_not_held(&q->blkcg_mutex);
   629	
   630		if (IS_ERR_OR_NULL(parent))
   631			return;
   632	
   633		for (; attr->name; attr++)
   634			debugfs_create_file_aux(attr->name, attr->mode, parent,
   635					    (void *)attr, data, &blk_mq_debugfs_fops);
   636	}
   637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
  2025-11-21  6:28 ` [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock Yu Kuai
  2025-11-21 23:32   ` kernel test robot
@ 2025-11-22  2:50   ` kernel test robot
  2025-11-22 16:04     ` Yu Kuai
  1 sibling, 1 reply; 19+ messages in thread
From: kernel test robot @ 2025-11-22  2:50 UTC (permalink / raw)
  To: Yu Kuai, axboe, linux-block, tj, nilay, ming.lei, bvanassche
  Cc: llvm, oe-kbuild-all, yukuai

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe/for-next]
[also build test ERROR on linus/master v6.18-rc6 next-20251121]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/blk-mq-debugfs-factor-out-a-helper-to-register-debugfs-for-all-rq_qos/20251121-143315
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
patch link:    https://lore.kernel.org/r/20251121062829.1433332-5-yukuai%40fnnas.com
patch subject: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
config: sparc64-defconfig (https://download.01.org/0day-ci/archive/20251122/202511221056.dAY0duWw-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251122/202511221056.dAY0duWw-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511221056.dAY0duWw-lkp@intel.com/

All errors (new ones prefixed by >>):

>> block/blk-mq-debugfs.c:628:30: error: no member named 'blkcg_mutex' in 'struct request_queue'
     628 |         lockdep_assert_not_held(&q->blkcg_mutex);
         |                                  ~  ^
   include/linux/lockdep.h:393:49: note: expanded from macro 'lockdep_assert_not_held'
     393 | #define lockdep_assert_not_held(l)              do { (void)(l); } while (0)
         |                                                             ^
   1 error generated.


vim +628 block/blk-mq-debugfs.c

   612	
   613	static void debugfs_create_files(struct request_queue *q, struct dentry *parent,
   614					 void *data,
   615					 const struct blk_mq_debugfs_attr *attr)
   616	{
   617		/*
   618		 * Creating new debugfs entries with queue freezed has the risk of
   619		 * deadlock.
   620		 */
   621		WARN_ON_ONCE(q->mq_freeze_depth != 0);
   622		/*
   623		 * debugfs_mutex should not be nested under other locks that can be
   624		 * grabbed while queue is frozen.
   625		 */
   626		lockdep_assert_not_held(&q->elevator_lock);
   627		lockdep_assert_not_held(&q->rq_qos_mutex);
 > 628		lockdep_assert_not_held(&q->blkcg_mutex);
   629	
   630		if (IS_ERR_OR_NULL(parent))
   631			return;
   632	
   633		for (; attr->name; attr++)
   634			debugfs_create_file_aux(attr->name, attr->mode, parent,
   635					    (void *)attr, data, &blk_mq_debugfs_fops);
   636	}
   637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
  2025-11-22  2:50   ` kernel test robot
@ 2025-11-22 16:04     ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-22 16:04 UTC (permalink / raw)
  To: kernel test robot, axboe, linux-block, tj, nilay, ming.lei,
	bvanassche
  Cc: llvm, oe-kbuild-all

Hi,


在 2025/11/22 10:50, kernel test robot 写道:
> Hi Yu,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on axboe/for-next]
> [also build test ERROR on linus/master v6.18-rc6 next-20251121]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/blk-mq-debugfs-factor-out-a-helper-to-register-debugfs-for-all-rq_qos/20251121-143315
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-next
> patch link:    https://lore.kernel.org/r/20251121062829.1433332-5-yukuai%40fnnas.com
> patch subject: [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock
> config: sparc64-defconfig (https://download.01.org/0day-ci/archive/20251122/202511221056.dAY0duWw-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251122/202511221056.dAY0duWw-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202511221056.dAY0duWw-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>>> block/blk-mq-debugfs.c:628:30: error: no member named 'blkcg_mutex' in 'struct request_queue'
>       628 |         lockdep_assert_not_held(&q->blkcg_mutex);
>           |                                  ~  ^
>     include/linux/lockdep.h:393:49: note: expanded from macro 'lockdep_assert_not_held'
>       393 | #define lockdep_assert_not_held(l)              do { (void)(l); } while (0)
>           |                                                             ^
>     1 error generated.
>
>
> vim +628 block/blk-mq-debugfs.c
>
>     612	
>     613	static void debugfs_create_files(struct request_queue *q, struct dentry *parent,
>     614					 void *data,
>     615					 const struct blk_mq_debugfs_attr *attr)
>     616	{
>     617		/*
>     618		 * Creating new debugfs entries with queue freezed has the risk of
>     619		 * deadlock.
>     620		 */
>     621		WARN_ON_ONCE(q->mq_freeze_depth != 0);
>     622		/*
>     623		 * debugfs_mutex should not be nested under other locks that can be
>     624		 * grabbed while queue is frozen.
>     625		 */
>     626		lockdep_assert_not_held(&q->elevator_lock);
>     627		lockdep_assert_not_held(&q->rq_qos_mutex);
>   > 628		lockdep_assert_not_held(&q->blkcg_mutex);
>     629	
>     630		if (IS_ERR_OR_NULL(parent))
>     631			return;
>     632	
>     633		for (; attr->name; attr++)
>     634			debugfs_create_file_aux(attr->name, attr->mode, parent,
>     635					    (void *)attr, data, &blk_mq_debugfs_fops);
>     636	}
>     637	

Thanks for the test, this set was build on the top of my other thread to introduce
blkcg_mutex, I'll rebase in the next version.


-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-11-22 16:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21  6:28 [PATCH v2 0/9] blk-mq: fix possible deadlocks Yu Kuai
2025-11-21  6:28 ` [PATCH v2 1/9] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
2025-11-21  6:28 ` [PATCH v2 2/9] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
2025-11-21 10:40   ` Nilay Shroff
2025-11-21 12:09     ` Yu Kuai
2025-11-21  6:28 ` [PATCH v2 3/9] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
2025-11-21 10:46   ` Nilay Shroff
2025-11-21 12:10     ` Yu Kuai
2025-11-21  6:28 ` [PATCH v2 4/9] blk-mq-debugfs: warn about possible deadlock Yu Kuai
2025-11-21 23:32   ` kernel test robot
2025-11-22  2:50   ` kernel test robot
2025-11-22 16:04     ` Yu Kuai
2025-11-21  6:28 ` [PATCH v2 5/9] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
2025-11-21  6:28 ` [PATCH v2 6/9] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
2025-11-21  6:28 ` [PATCH v2 7/9] blk-iocost: " Yu Kuai
2025-11-21  6:28 ` [PATCH v2 8/9] blk-iolatency: " Yu Kuai
2025-11-21 10:53   ` Nilay Shroff
2025-11-21 12:13     ` Yu Kuai
2025-11-21  6:28 ` [PATCH v2 9/9] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).