* [PATCH v3 00/10] blk-mq: fix possible deadlocks
@ 2025-11-30 2:43 Yu Kuai
2025-11-30 2:43 ` [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
` (10 more replies)
0 siblings, 11 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
changes in v3:
- remove changes for blk-iolatency and blk-iocost in patch 2, since
they don't have debugfs entries.
- add patch 9 to fix lock order for blk-throttle.
changes in v2:
- combine two set into one;
Yu Kuai (10):
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-mq-debugfs: warn about possible deadlock
block/blk-rq-qos: add a new helper rq_qos_add_frozen()
blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze
queue
blk-throttle: remove useless queue frozen
block/blk-rq-qos: cleanup rq_qos_add()
block/blk-iocost.c | 15 ++++++-----
block/blk-iolatency.c | 11 +++++---
block/blk-mq-debugfs.c | 57 ++++++++++++++++++++++++++++++------------
block/blk-mq-debugfs.h | 4 +--
block/blk-rq-qos.c | 27 ++++----------------
block/blk-sysfs.c | 4 +++
block/blk-throttle.c | 11 ++------
block/blk-throttle.h | 3 ++-
block/blk-wbt.c | 10 +++++++-
9 files changed, 79 insertions(+), 63 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
` (9 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
There is already a helper blk_mq_debugfs_register_rqos() to register
one rqos, however this helper is called synchronously when the rqos is
created with queue frozen.
Prepare to fix possible deadlock to create blk-mq debugfs entries while
queue is still frozen.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-mq-debugfs.c | 23 +++++++++++++++--------
block/blk-mq-debugfs.h | 5 +++++
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4896525b1c05..128d2aa6a20d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -619,6 +619,20 @@ static void debugfs_create_files(struct dentry *parent, void *data,
(void *)attr, data, &blk_mq_debugfs_fops);
}
+void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
+{
+ lockdep_assert_held(&q->debugfs_mutex);
+
+ if (q->rq_qos) {
+ struct rq_qos *rqos = q->rq_qos;
+
+ while (rqos) {
+ blk_mq_debugfs_register_rqos(rqos);
+ rqos = rqos->next;
+ }
+ }
+}
+
void blk_mq_debugfs_register(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
@@ -631,14 +645,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
blk_mq_debugfs_register_hctx(q, hctx);
}
- if (q->rq_qos) {
- struct rq_qos *rqos = q->rq_qos;
-
- while (rqos) {
- blk_mq_debugfs_register_rqos(rqos);
- rqos = rqos->next;
- }
- }
+ blk_mq_debugfs_register_rq_qos(q);
}
static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index c80e453e3014..54948a266889 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -33,6 +33,7 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx);
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
+void blk_mq_debugfs_register_rq_qos(struct request_queue *q);
void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
#else
@@ -78,6 +79,10 @@ static inline void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
}
+static inline void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
+{
+}
+
static inline void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
}
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
2025-11-30 2:43 ` [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
` (8 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Currently rq-qos debugfs entries are created from rq_qos_add(), while
rq_qos_add() can be called while queue is still frozen. This can
deadlock because creating new entries can trigger fs reclaim.
Fix this problem by delaying creating rq-qos debugfs entries after queue
is unfrozen.
- For wbt, 1) it can be initialized by default, fix it by calling new
helper after wbt_init() from wbt_enable_default; 2) it can be
initialized by sysfs, fix it by calling new helper after queue is
unfrozen from queue_wb_lat_store().
- For iocost and iolatency, they can only be initialized by blkcg
configuration, however, they don't have debugfs entries for now, hence
they are not handled yet.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-rq-qos.c | 7 -------
block/blk-sysfs.c | 4 ++++
block/blk-wbt.c | 6 +++++-
3 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 654478dfbc20..d7ce99ce2e80 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -347,13 +347,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
blk_mq_unfreeze_queue(q, memflags);
-
- if (rqos->ops->debugfs_attrs) {
- mutex_lock(&q->debugfs_mutex);
- blk_mq_debugfs_register_rqos(rqos);
- mutex_unlock(&q->debugfs_mutex);
- }
-
return 0;
ebusy:
blk_mq_unfreeze_queue(q, memflags);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 8684c57498cc..cb0a12253c6e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -681,6 +681,10 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
out:
blk_mq_unfreeze_queue(q, memflags);
+ mutex_lock(&q->debugfs_mutex);
+ blk_mq_debugfs_register_rq_qos(q);
+ mutex_unlock(&q->debugfs_mutex);
+
return ret;
}
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index eb8037bae0bd..b1ab0f297f24 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -724,8 +724,12 @@ void wbt_enable_default(struct gendisk *disk)
if (!blk_queue_registered(q))
return;
- if (queue_is_mq(q) && enable)
+ if (queue_is_mq(q) && enable) {
wbt_init(disk);
+ mutex_lock(&q->debugfs_mutex);
+ blk_mq_debugfs_register_rq_qos(q);
+ mutex_unlock(&q->debugfs_mutex);
+ }
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
2025-11-30 2:43 ` [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
2025-11-30 2:43 ` [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock Yu Kuai
` (7 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Because it's only used inside blk-mq-debugfs.c now.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-mq-debugfs.c | 4 +++-
block/blk-mq-debugfs.h | 5 -----
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 128d2aa6a20d..99466595c0a4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -14,6 +14,8 @@
#include "blk-mq-sched.h"
#include "blk-rq-qos.h"
+static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
+
static int queue_poll_stat_show(void *data, struct seq_file *m)
{
return 0;
@@ -758,7 +760,7 @@ void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
rqos->debugfs_dir = NULL;
}
-void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
+static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
struct request_queue *q = rqos->disk->queue;
const char *dir_name = rq_qos_id_to_name(rqos->id);
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index 54948a266889..d94daa66556b 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -34,7 +34,6 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
void blk_mq_debugfs_register_rq_qos(struct request_queue *q);
-void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
#else
static inline void blk_mq_debugfs_register(struct request_queue *q)
@@ -75,10 +74,6 @@ static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hc
{
}
-static inline void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
-{
-}
-
static inline void blk_mq_debugfs_register_rq_qos(struct request_queue *q)
{
}
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (2 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
` (6 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Creating new debugfs entries can trigger fs reclaim, hence we can't do
this with queue frozen, meanwhile, other locks that can be held while
queue is frozen should not be held as well.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-mq-debugfs.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 99466595c0a4..d54f8c29d2f4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -610,9 +610,22 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
{},
};
-static void debugfs_create_files(struct dentry *parent, void *data,
+static void debugfs_create_files(struct request_queue *q, struct dentry *parent,
+ void *data,
const struct blk_mq_debugfs_attr *attr)
{
+ /*
+ * Creating new debugfs entries with queue freezed has the risk of
+ * deadlock.
+ */
+ WARN_ON_ONCE(q->mq_freeze_depth != 0);
+ /*
+ * debugfs_mutex should not be nested under other locks that can be
+ * grabbed while queue is frozen.
+ */
+ lockdep_assert_not_held(&q->elevator_lock);
+ lockdep_assert_not_held(&q->rq_qos_mutex);
+
if (IS_ERR_OR_NULL(parent))
return;
@@ -640,7 +653,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
unsigned long i;
- debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
+ debugfs_create_files(q, q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
queue_for_each_hw_ctx(q, hctx, i) {
if (!hctx->debugfs_dir)
@@ -659,7 +672,8 @@ static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx,
snprintf(name, sizeof(name), "cpu%u", ctx->cpu);
ctx_dir = debugfs_create_dir(name, hctx->debugfs_dir);
- debugfs_create_files(ctx_dir, ctx, blk_mq_debugfs_ctx_attrs);
+ debugfs_create_files(hctx->queue, ctx_dir, ctx,
+ blk_mq_debugfs_ctx_attrs);
}
void blk_mq_debugfs_register_hctx(struct request_queue *q,
@@ -675,7 +689,8 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
snprintf(name, sizeof(name), "hctx%u", hctx->queue_num);
hctx->debugfs_dir = debugfs_create_dir(name, q->debugfs_dir);
- debugfs_create_files(hctx->debugfs_dir, hctx, blk_mq_debugfs_hctx_attrs);
+ debugfs_create_files(q, hctx->debugfs_dir, hctx,
+ blk_mq_debugfs_hctx_attrs);
hctx_for_each_ctx(hctx, ctx, i)
blk_mq_debugfs_register_ctx(hctx, ctx);
@@ -726,7 +741,7 @@ void blk_mq_debugfs_register_sched(struct request_queue *q)
q->sched_debugfs_dir = debugfs_create_dir("sched", q->debugfs_dir);
- debugfs_create_files(q->sched_debugfs_dir, q, e->queue_debugfs_attrs);
+ debugfs_create_files(q, q->sched_debugfs_dir, q, e->queue_debugfs_attrs);
}
void blk_mq_debugfs_unregister_sched(struct request_queue *q)
@@ -775,7 +790,8 @@ static void blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
q->debugfs_dir);
rqos->debugfs_dir = debugfs_create_dir(dir_name, q->rqos_debugfs_dir);
- debugfs_create_files(rqos->debugfs_dir, rqos, rqos->ops->debugfs_attrs);
+ debugfs_create_files(q, rqos->debugfs_dir, rqos,
+ rqos->ops->debugfs_attrs);
}
void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
@@ -798,7 +814,7 @@ void blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
hctx->sched_debugfs_dir = debugfs_create_dir("sched",
hctx->debugfs_dir);
- debugfs_create_files(hctx->sched_debugfs_dir, hctx,
+ debugfs_create_files(q, hctx->sched_debugfs_dir, hctx,
e->hctx_debugfs_attrs);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (3 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
` (5 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
queue should not be frozen under rq_qos_mutex, see example from
commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
parameters"), which means current implementation of rq_qos_add() is
problematic. Add a new helper and prepare to fix this problem in
following patches.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-rq-qos.c | 21 +++++++++++++++++++++
block/blk-rq-qos.h | 2 ++
2 files changed, 23 insertions(+)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index d7ce99ce2e80..c1a94c2a9742 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,6 +322,27 @@ void rq_qos_exit(struct request_queue *q)
mutex_unlock(&q->rq_qos_mutex);
}
+int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops)
+{
+ struct request_queue *q = disk->queue;
+
+ WARN_ON_ONCE(q->mq_freeze_depth == 0);
+ lockdep_assert_held(&q->rq_qos_mutex);
+
+ if (rq_qos_id(q, id))
+ return -EBUSY;
+
+ rqos->disk = disk;
+ rqos->id = id;
+ rqos->ops = ops;
+ rqos->next = q->rq_qos;
+ q->rq_qos = rqos;
+ blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
+
+ return 0;
+}
+
int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
const struct rq_qos_ops *ops)
{
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index b538f2c0febc..8d9fb10ae526 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,6 +87,8 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
const struct rq_qos_ops *ops);
+int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops);
void rq_qos_del(struct rq_qos *rqos);
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (4 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-12-01 0:29 ` Ming Lei
2025-11-30 2:43 ` [PATCH v3 07/10] blk-iocost: " Yu Kuai
` (4 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
wbt_init() can be called from sysfs attribute and wbt_enable_default(),
however the lock order are inversely.
- queue_wb_lat_store() freeze queue first, and then wbt_init() hold
rq_qos_mutex. In this case queue will be frozen again inside
rq_qos_add(), however, in this case freeze queue recursively is
inoperative;
- wbt_enable_default() from elevator switch will hold rq_qos_mutex
first, and then rq_qos_add() will freeze queue;
Fix this problem by converting to use new helper rq_qos_add_frozen() in
wbt_init(), and for wbt_enable_default(), freeze queue before calling
wbt_init().
Fixes: a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-wbt.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index b1ab0f297f24..5e7e481103a1 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -725,7 +725,11 @@ void wbt_enable_default(struct gendisk *disk)
return;
if (queue_is_mq(q) && enable) {
+ unsigned int memflags = blk_mq_freeze_queue(q);
+
wbt_init(disk);
+ blk_mq_unfreeze_queue(q, memflags);
+
mutex_lock(&q->debugfs_mutex);
blk_mq_debugfs_register_rq_qos(q);
mutex_unlock(&q->debugfs_mutex);
@@ -926,7 +930,7 @@ int wbt_init(struct gendisk *disk)
* Assign rwb and add the stats callback.
*/
mutex_lock(&q->rq_qos_mutex);
- ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+ ret = rq_qos_add_frozen(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
mutex_unlock(&q->rq_qos_mutex);
if (ret)
goto err_free;
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (5 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 08/10] blk-iolatency: " Yu Kuai
` (3 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Like wbt, rq_qos_add() can be called from two path and the lock order
are inversely:
- From ioc_qos_write(), queue is already frozen before rq_qos_add();
- From ioc_cost_model_write(), rq_qos_add() is called directly;
Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from ioc_cost_model_write(), then since all rq_qos_add() callers
already freeze queue, convert to use rq_qos_add_frozen().
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iocost.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index a0416927d33d..929fc1421d7e 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2925,7 +2925,7 @@ static int blk_iocost_init(struct gendisk *disk)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+ ret = rq_qos_add_frozen(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
if (ret)
goto err_free_ioc;
@@ -3408,7 +3408,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
{
struct blkg_conf_ctx ctx;
struct request_queue *q;
- unsigned int memflags;
+ unsigned long memflags;
struct ioc *ioc;
u64 u[NR_I_LCOEFS];
bool user;
@@ -3417,9 +3417,11 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
blkg_conf_init(&ctx, input);
- ret = blkg_conf_open_bdev(&ctx);
- if (ret)
+ memflags = blkg_conf_open_bdev_frozen(&ctx);
+ if (IS_ERR_VALUE(memflags)) {
+ ret = memflags;
goto err;
+ }
body = ctx.body;
q = bdev_get_queue(ctx.bdev);
@@ -3436,7 +3438,6 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
ioc = q_to_ioc(q);
}
- memflags = blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
spin_lock_irq(&ioc->lock);
@@ -3488,20 +3489,18 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
spin_unlock_irq(&ioc->lock);
blk_mq_unquiesce_queue(q);
- blk_mq_unfreeze_queue(q, memflags);
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return nbytes;
einval:
spin_unlock_irq(&ioc->lock);
blk_mq_unquiesce_queue(q);
- blk_mq_unfreeze_queue(q, memflags);
ret = -EINVAL;
err:
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return ret;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (6 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 07/10] blk-iocost: " Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 09/10] blk-throttle: remove useless queue frozen Yu Kuai
` (2 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Currently blk-iolatency will hold rq_qos_mutex first and then call
rq_qos_add() to freeze queue.
Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from iolatency_set_limit(), and convert to use rq_qos_add_frozen().
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iolatency.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 45bd18f68541..1558afbf517b 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
if (!blkiolat)
return -ENOMEM;
- ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
- &blkcg_iolatency_ops);
+ ret = rq_qos_add_frozen(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+ &blkcg_iolatency_ops);
if (ret)
goto err_free;
ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
@@ -831,16 +831,19 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
struct blkcg_gq *blkg;
struct blkg_conf_ctx ctx;
struct iolatency_grp *iolat;
+ unsigned long memflags;
char *p, *tok;
u64 lat_val = 0;
u64 oldval;
- int ret;
+ int ret = 0;
blkg_conf_init(&ctx, buf);
- ret = blkg_conf_open_bdev(&ctx);
- if (ret)
+ memflags = blkg_conf_open_bdev_frozen(&ctx);
+ if (IS_ERR_VALUE(memflags)) {
+ ret = memflags;
goto out;
+ }
/*
* blk_iolatency_init() may fail after rq_qos_add() succeeds which can
@@ -890,7 +893,7 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
iolatency_clear_scaling(blkg);
ret = 0;
out:
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return ret ?: nbytes;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 09/10] blk-throttle: remove useless queue frozen
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (7 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 08/10] blk-iolatency: " Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 2:43 ` [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
2025-11-30 10:09 ` [syzbot ci] Re: blk-mq: fix possible deadlocks syzbot ci
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
blk-throttle is still holding rq_qos_mutex before freezing queue from
blk_throtl_init().
However blk_throtl_bio() can be called before grabbing q_usage_counter
hence freeze queue really doesn't stop new IO issuing to blk-throtl.
Also use READ_ONCE and WRITE_ONCE for q->td because blk_throtl_init()
can concurrent with issuing IO.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-throttle.c | 11 ++---------
block/blk-throttle.h | 3 ++-
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 97188a795848..6c63b9714afa 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1310,7 +1310,6 @@ static int blk_throtl_init(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
struct throtl_data *td;
- unsigned int memflags;
int ret;
td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node);
@@ -1320,22 +1319,16 @@ static int blk_throtl_init(struct gendisk *disk)
INIT_WORK(&td->dispatch_work, blk_throtl_dispatch_work_fn);
throtl_service_queue_init(&td->service_queue);
- memflags = blk_mq_freeze_queue(disk->queue);
- blk_mq_quiesce_queue(disk->queue);
-
- q->td = td;
+ WRITE_ONCE(q->td, td);
td->queue = q;
/* activate policy, blk_throtl_activated() will return true */
ret = blkcg_activate_policy(disk, &blkcg_policy_throtl);
if (ret) {
- q->td = NULL;
+ WRITE_ONCE(q->td, NULL);
kfree(td);
}
- blk_mq_unquiesce_queue(disk->queue);
- blk_mq_unfreeze_queue(disk->queue, memflags);
-
return ret;
}
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 9d7a42c039a1..3d177b20f9e1 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -162,7 +162,8 @@ static inline bool blk_throtl_activated(struct request_queue *q)
* blkcg_policy_enabled() guarantees that the policy is activated
* in the request_queue.
*/
- return q->td != NULL && blkcg_policy_enabled(q, &blkcg_policy_throtl);
+ return READ_ONCE(q->td) &&
+ blkcg_policy_enabled(q, &blkcg_policy_throtl);
}
static inline bool blk_should_throtl(struct bio *bio)
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (8 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 09/10] blk-throttle: remove useless queue frozen Yu Kuai
@ 2025-11-30 2:43 ` Yu Kuai
2025-11-30 10:09 ` [syzbot ci] Re: blk-mq: fix possible deadlocks syzbot ci
10 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 2:43 UTC (permalink / raw)
To: axboe, linux-block, tj, nilay, ming.lei, bvanassche; +Cc: yukuai
Now that there is no caller of rq_qos_add(), remove it, and also rename
rq_qos_add_frozen() back to rq_qos_add().
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iocost.c | 2 +-
block/blk-iolatency.c | 4 ++--
block/blk-rq-qos.c | 35 ++---------------------------------
block/blk-rq-qos.h | 2 --
block/blk-wbt.c | 2 +-
5 files changed, 6 insertions(+), 39 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 929fc1421d7e..0359a5b65202 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2925,7 +2925,7 @@ static int blk_iocost_init(struct gendisk *disk)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- ret = rq_qos_add_frozen(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+ ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
if (ret)
goto err_free_ioc;
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 1558afbf517b..5b18125e21c9 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
if (!blkiolat)
return -ENOMEM;
- ret = rq_qos_add_frozen(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
- &blkcg_iolatency_ops);
+ ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+ &blkcg_iolatency_ops);
if (ret)
goto err_free;
ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index c1a94c2a9742..8de7dae3273e 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,8 +322,8 @@ void rq_qos_exit(struct request_queue *q)
mutex_unlock(&q->rq_qos_mutex);
}
-int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
- enum rq_qos_id id, const struct rq_qos_ops *ops)
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
+ const struct rq_qos_ops *ops)
{
struct request_queue *q = disk->queue;
@@ -343,37 +343,6 @@ int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
return 0;
}
-int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
- const struct rq_qos_ops *ops)
-{
- struct request_queue *q = disk->queue;
- unsigned int memflags;
-
- lockdep_assert_held(&q->rq_qos_mutex);
-
- rqos->disk = disk;
- rqos->id = id;
- rqos->ops = ops;
-
- /*
- * No IO can be in-flight when adding rqos, so freeze queue, which
- * is fine since we only support rq_qos for blk-mq queue.
- */
- memflags = blk_mq_freeze_queue(q);
-
- if (rq_qos_id(q, rqos->id))
- goto ebusy;
- rqos->next = q->rq_qos;
- q->rq_qos = rqos;
- blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
-
- blk_mq_unfreeze_queue(q, memflags);
- return 0;
-ebusy:
- blk_mq_unfreeze_queue(q, memflags);
- return -EBUSY;
-}
-
void rq_qos_del(struct rq_qos *rqos)
{
struct request_queue *q = rqos->disk->queue;
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 8d9fb10ae526..b538f2c0febc 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,8 +87,6 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
const struct rq_qos_ops *ops);
-int rq_qos_add_frozen(struct rq_qos *rqos, struct gendisk *disk,
- enum rq_qos_id id, const struct rq_qos_ops *ops);
void rq_qos_del(struct rq_qos *rqos);
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 5e7e481103a1..0f90d9a97ef4 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -930,7 +930,7 @@ int wbt_init(struct gendisk *disk)
* Assign rwb and add the stats callback.
*/
mutex_lock(&q->rq_qos_mutex);
- ret = rq_qos_add_frozen(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+ ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
mutex_unlock(&q->rq_qos_mutex);
if (ret)
goto err_free;
--
2.51.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
` (9 preceding siblings ...)
2025-11-30 2:43 ` [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
@ 2025-11-30 10:09 ` syzbot ci
2025-11-30 19:50 ` Yu Kuai
10 siblings, 1 reply; 18+ messages in thread
From: syzbot ci @ 2025-11-30 10:09 UTC (permalink / raw)
To: axboe, bvanassche, linux-block, ming.lei, nilay, tj, yukuai
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v3] blk-mq: fix possible deadlocks
https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
* [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
* [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
* [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
* [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
* [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
* [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
* [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
* [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
* [PATCH v3 09/10] blk-throttle: remove useless queue frozen
* [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
and found the following issue:
possible deadlock in pcpu_alloc_noprof
Full report is available here:
https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
***
possible deadlock in pcpu_alloc_noprof
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz-executor/6047 is trying to acquire lock:
ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
but task is already holding lock:
ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
__mutex_lock+0x187/0x1350
pcpu_alloc_noprof+0x25b/0x1750
blk_stat_alloc_callback+0xd5/0x220
wbt_init+0xa3/0x500
wbt_enable_default+0x25d/0x350
blk_register_queue+0x36a/0x3f0
__add_disk+0x677/0xd50
add_disk_fwnode+0xfc/0x480
loop_add+0x7f0/0xad0
loop_init+0xd9/0x170
do_one_initcall+0x1fb/0x820
do_initcall_level+0x104/0x190
do_initcalls+0x59/0xa0
kernel_init_freeable+0x334/0x4b0
kernel_init+0x1d/0x1d0
ret_from_fork+0x599/0xb30
ret_from_fork_asm+0x1a/0x30
-> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
blk_alloc_queue+0x538/0x620
__blk_mq_alloc_disk+0x15c/0x340
loop_add+0x411/0xad0
loop_init+0xd9/0x170
do_one_initcall+0x1fb/0x820
do_initcall_level+0x104/0x190
do_initcalls+0x59/0xa0
kernel_init_freeable+0x334/0x4b0
kernel_init+0x1d/0x1d0
ret_from_fork+0x599/0xb30
ret_from_fork_asm+0x1a/0x30
-> #0 (fs_reclaim){+.+.}-{0:0}:
__lock_acquire+0x15a6/0x2cf0
lock_acquire+0x117/0x340
fs_reclaim_acquire+0x72/0x100
prepare_alloc_pages+0x152/0x650
__alloc_frozen_pages_noprof+0x123/0x370
__alloc_pages_noprof+0xa/0x30
pcpu_populate_chunk+0x182/0xb30
pcpu_alloc_noprof+0xcb6/0x1750
xt_percpu_counter_alloc+0x161/0x220
translate_table+0x1323/0x2040
ip6t_register_table+0x106/0x7d0
ip6table_nat_table_init+0x43/0x2e0
xt_find_table_lock+0x30c/0x3e0
xt_request_find_table_lock+0x26/0x100
do_ip6t_get_ctl+0x730/0x1180
nf_getsockopt+0x26e/0x290
ipv6_getsockopt+0x1ed/0x290
do_sock_getsockopt+0x2b4/0x3d0
__x64_sys_getsockopt+0x1a5/0x250
do_syscall_64+0xfa/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpu_alloc_mutex);
lock(&q->q_usage_counter(io)#17);
lock(pcpu_alloc_mutex);
lock(fs_reclaim);
*** DEADLOCK ***
1 lock held by syz-executor/6047:
#0: ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
stack backtrace:
CPU: 0 UID: 0 PID: 6047 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250
print_circular_bug+0x2e2/0x300
check_noncircular+0x12e/0x150
__lock_acquire+0x15a6/0x2cf0
lock_acquire+0x117/0x340
fs_reclaim_acquire+0x72/0x100
prepare_alloc_pages+0x152/0x650
__alloc_frozen_pages_noprof+0x123/0x370
__alloc_pages_noprof+0xa/0x30
pcpu_populate_chunk+0x182/0xb30
pcpu_alloc_noprof+0xcb6/0x1750
xt_percpu_counter_alloc+0x161/0x220
translate_table+0x1323/0x2040
ip6t_register_table+0x106/0x7d0
ip6table_nat_table_init+0x43/0x2e0
xt_find_table_lock+0x30c/0x3e0
xt_request_find_table_lock+0x26/0x100
do_ip6t_get_ctl+0x730/0x1180
nf_getsockopt+0x26e/0x290
ipv6_getsockopt+0x1ed/0x290
do_sock_getsockopt+0x2b4/0x3d0
__x64_sys_getsockopt+0x1a5/0x250
do_syscall_64+0xfa/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7feba799150a
Code: ff c3 66 0f 1f 44 00 00 48 c7 c2 a8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 c7 c2 a8 ff ff ff f7
RSP: 002b:00007fff14c6a9e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007feba799150a
RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
RBP: 0000000000000029 R08: 00007fff14c6aa0c R09: ffffffffff000000
R10: 00007feba7bb6368 R11: 0000000000000246 R12: 00007feba7a30907
R13: 00007feba7bb7e60 R14: 00007feba7bb6368 R15: 00007feba7bb6360
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-11-30 10:09 ` [syzbot ci] Re: blk-mq: fix possible deadlocks syzbot ci
@ 2025-11-30 19:50 ` Yu Kuai
2025-12-01 0:26 ` Ming Lei
0 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-11-30 19:50 UTC (permalink / raw)
To: syzbot ci, axboe, bvanassche, linux-block, ming.lei, nilay, tj,
Yu Kuai
Cc: syzbot, syzkaller-bugs
Hi,
在 2025/11/30 18:09, syzbot ci 写道:
> syzbot ci has tested the following series
>
> [v3] blk-mq: fix possible deadlocks
> https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
> * [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
> * [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
> * [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
> * [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
> * [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
> * [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
> * [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
> * [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
> * [PATCH v3 09/10] blk-throttle: remove useless queue frozen
> * [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
>
> and found the following issue:
> possible deadlock in pcpu_alloc_noprof
>
> Full report is available here:
> https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
>
> ***
>
> possible deadlock in pcpu_alloc_noprof
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
>
> soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
> ======================================================
> WARNING: possible circular locking dependency detected
> syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor/6047 is trying to acquire lock:
> ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
>
> but task is already holding lock:
> ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
> __mutex_lock+0x187/0x1350
> pcpu_alloc_noprof+0x25b/0x1750
> blk_stat_alloc_callback+0xd5/0x220
> wbt_init+0xa3/0x500
> wbt_enable_default+0x25d/0x350
> blk_register_queue+0x36a/0x3f0
> __add_disk+0x677/0xd50
> add_disk_fwnode+0xfc/0x480
> loop_add+0x7f0/0xad0
> loop_init+0xd9/0x170
> do_one_initcall+0x1fb/0x820
> do_initcall_level+0x104/0x190
> do_initcalls+0x59/0xa0
> kernel_init_freeable+0x334/0x4b0
> kernel_init+0x1d/0x1d0
> ret_from_fork+0x599/0xb30
> ret_from_fork_asm+0x1a/0x30
>
> -> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
> blk_alloc_queue+0x538/0x620
> __blk_mq_alloc_disk+0x15c/0x340
> loop_add+0x411/0xad0
> loop_init+0xd9/0x170
> do_one_initcall+0x1fb/0x820
> do_initcall_level+0x104/0x190
> do_initcalls+0x59/0xa0
> kernel_init_freeable+0x334/0x4b0
> kernel_init+0x1d/0x1d0
> ret_from_fork+0x599/0xb30
> ret_from_fork_asm+0x1a/0x30
>
> -> #0 (fs_reclaim){+.+.}-{0:0}:
> __lock_acquire+0x15a6/0x2cf0
> lock_acquire+0x117/0x340
> fs_reclaim_acquire+0x72/0x100
> prepare_alloc_pages+0x152/0x650
> __alloc_frozen_pages_noprof+0x123/0x370
> __alloc_pages_noprof+0xa/0x30
> pcpu_populate_chunk+0x182/0xb30
> pcpu_alloc_noprof+0xcb6/0x1750
> xt_percpu_counter_alloc+0x161/0x220
> translate_table+0x1323/0x2040
> ip6t_register_table+0x106/0x7d0
> ip6table_nat_table_init+0x43/0x2e0
> xt_find_table_lock+0x30c/0x3e0
> xt_request_find_table_lock+0x26/0x100
> do_ip6t_get_ctl+0x730/0x1180
> nf_getsockopt+0x26e/0x290
> ipv6_getsockopt+0x1ed/0x290
> do_sock_getsockopt+0x2b4/0x3d0
> __x64_sys_getsockopt+0x1a5/0x250
> do_syscall_64+0xfa/0xf80
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> other info that might help us debug this:
>
> Chain exists of:
> fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(pcpu_alloc_mutex);
> lock(&q->q_usage_counter(io)#17);
> lock(pcpu_alloc_mutex);
> lock(fs_reclaim);
This does not look like introduced by this set, wbt_init() will hold
pcpu_alloc_mutex, and it can be called with queue frozen without this
set.
Looks like we should allocate rwb before freeze queue, like what we
did in other path.
>
> *** DEADLOCK ***
>
> 1 lock held by syz-executor/6047:
> #0: ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
>
> stack backtrace:
> CPU: 0 UID: 0 PID: 6047 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x189/0x250
> print_circular_bug+0x2e2/0x300
> check_noncircular+0x12e/0x150
> __lock_acquire+0x15a6/0x2cf0
> lock_acquire+0x117/0x340
> fs_reclaim_acquire+0x72/0x100
> prepare_alloc_pages+0x152/0x650
> __alloc_frozen_pages_noprof+0x123/0x370
> __alloc_pages_noprof+0xa/0x30
> pcpu_populate_chunk+0x182/0xb30
> pcpu_alloc_noprof+0xcb6/0x1750
> xt_percpu_counter_alloc+0x161/0x220
> translate_table+0x1323/0x2040
> ip6t_register_table+0x106/0x7d0
> ip6table_nat_table_init+0x43/0x2e0
> xt_find_table_lock+0x30c/0x3e0
> xt_request_find_table_lock+0x26/0x100
> do_ip6t_get_ctl+0x730/0x1180
> nf_getsockopt+0x26e/0x290
> ipv6_getsockopt+0x1ed/0x290
> do_sock_getsockopt+0x2b4/0x3d0
> __x64_sys_getsockopt+0x1a5/0x250
> do_syscall_64+0xfa/0xf80
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7feba799150a
> Code: ff c3 66 0f 1f 44 00 00 48 c7 c2 a8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 c7 c2 a8 ff ff ff f7
> RSP: 002b:00007fff14c6a9e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007feba799150a
> RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
> RBP: 0000000000000029 R08: 00007fff14c6aa0c R09: ffffffffff000000
> R10: 00007feba7bb6368 R11: 0000000000000246 R12: 00007feba7a30907
> R13: 00007feba7bb7e60 R14: 00007feba7bb6368 R15: 00007feba7bb6360
> </TASK>
>
>
> ***
>
> If these findings have caused you to resend the series or submit a
> separate fix, please add the following tag to your commit message:
> Tested-by: syzbot@syzkaller.appspotmail.com
>
> ---
> This report is generated by a bot. It may contain errors.
> syzbot ci engineers can be reached at syzkaller@googlegroups.com.
>
--
Thanks,
Kuai
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-11-30 19:50 ` Yu Kuai
@ 2025-12-01 0:26 ` Ming Lei
2025-12-01 4:43 ` Yu Kuai
0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2025-12-01 0:26 UTC (permalink / raw)
To: Yu Kuai
Cc: syzbot ci, axboe, bvanassche, linux-block, nilay, tj, syzbot,
syzkaller-bugs
On Mon, Dec 01, 2025 at 03:50:22AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2025/11/30 18:09, syzbot ci 写道:
> > syzbot ci has tested the following series
> >
> > [v3] blk-mq: fix possible deadlocks
> > https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
> > * [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
> > * [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
> > * [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
> > * [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
> > * [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
> > * [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
> > * [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
> > * [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
> > * [PATCH v3 09/10] blk-throttle: remove useless queue frozen
> > * [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
> >
> > and found the following issue:
> > possible deadlock in pcpu_alloc_noprof
> >
> > Full report is available here:
> > https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
> >
> > ***
> >
> > possible deadlock in pcpu_alloc_noprof
> >
> > tree: linux-next
> > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> > base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
> > arch: amd64
> > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
> >
> > soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > syzkaller #0 Not tainted
> > ------------------------------------------------------
> > syz-executor/6047 is trying to acquire lock:
> > ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
> >
> > but task is already holding lock:
> > ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
> > __mutex_lock+0x187/0x1350
> > pcpu_alloc_noprof+0x25b/0x1750
> > blk_stat_alloc_callback+0xd5/0x220
> > wbt_init+0xa3/0x500
> > wbt_enable_default+0x25d/0x350
> > blk_register_queue+0x36a/0x3f0
> > __add_disk+0x677/0xd50
> > add_disk_fwnode+0xfc/0x480
> > loop_add+0x7f0/0xad0
> > loop_init+0xd9/0x170
> > do_one_initcall+0x1fb/0x820
> > do_initcall_level+0x104/0x190
> > do_initcalls+0x59/0xa0
> > kernel_init_freeable+0x334/0x4b0
> > kernel_init+0x1d/0x1d0
> > ret_from_fork+0x599/0xb30
> > ret_from_fork_asm+0x1a/0x30
> >
> > -> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
> > blk_alloc_queue+0x538/0x620
> > __blk_mq_alloc_disk+0x15c/0x340
> > loop_add+0x411/0xad0
> > loop_init+0xd9/0x170
> > do_one_initcall+0x1fb/0x820
> > do_initcall_level+0x104/0x190
> > do_initcalls+0x59/0xa0
> > kernel_init_freeable+0x334/0x4b0
> > kernel_init+0x1d/0x1d0
> > ret_from_fork+0x599/0xb30
> > ret_from_fork_asm+0x1a/0x30
> >
> > -> #0 (fs_reclaim){+.+.}-{0:0}:
> > __lock_acquire+0x15a6/0x2cf0
> > lock_acquire+0x117/0x340
> > fs_reclaim_acquire+0x72/0x100
> > prepare_alloc_pages+0x152/0x650
> > __alloc_frozen_pages_noprof+0x123/0x370
> > __alloc_pages_noprof+0xa/0x30
> > pcpu_populate_chunk+0x182/0xb30
> > pcpu_alloc_noprof+0xcb6/0x1750
> > xt_percpu_counter_alloc+0x161/0x220
> > translate_table+0x1323/0x2040
> > ip6t_register_table+0x106/0x7d0
> > ip6table_nat_table_init+0x43/0x2e0
> > xt_find_table_lock+0x30c/0x3e0
> > xt_request_find_table_lock+0x26/0x100
> > do_ip6t_get_ctl+0x730/0x1180
> > nf_getsockopt+0x26e/0x290
> > ipv6_getsockopt+0x1ed/0x290
> > do_sock_getsockopt+0x2b4/0x3d0
> > __x64_sys_getsockopt+0x1a5/0x250
> > do_syscall_64+0xfa/0xf80
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(pcpu_alloc_mutex);
> > lock(&q->q_usage_counter(io)#17);
> > lock(pcpu_alloc_mutex);
> > lock(fs_reclaim);
>
> This does not look like introduced by this set, wbt_init() will hold
> pcpu_alloc_mutex, and it can be called with queue frozen without this
> set.
It is introduced by your patch 6 in which blk_mq_freeze_queue() is added
before calling wb_init() from wbt_enable_default(), then the warning is
triggered.
Thanks,
Ming
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-30 2:43 ` [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-12-01 0:29 ` Ming Lei
0 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2025-12-01 0:29 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, linux-block, tj, nilay, bvanassche
On Sun, Nov 30, 2025 at 10:43:45AM +0800, Yu Kuai wrote:
> wbt_init() can be called from sysfs attribute and wbt_enable_default(),
> however the lock order are inversely.
>
> - queue_wb_lat_store() freeze queue first, and then wbt_init() hold
> rq_qos_mutex. In this case queue will be frozen again inside
> rq_qos_add(), however, in this case freeze queue recursively is
> inoperative;
> - wbt_enable_default() from elevator switch will hold rq_qos_mutex
> first, and then rq_qos_add() will freeze queue;
>
> Fix this problem by converting to use new helper rq_qos_add_frozen() in
> wbt_init(), and for wbt_enable_default(), freeze queue before calling
> wbt_init().
>
> Fixes: a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock")
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
> block/blk-wbt.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index b1ab0f297f24..5e7e481103a1 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -725,7 +725,11 @@ void wbt_enable_default(struct gendisk *disk)
> return;
>
> if (queue_is_mq(q) && enable) {
> + unsigned int memflags = blk_mq_freeze_queue(q);
> +
> wbt_init(disk);
> + blk_mq_unfreeze_queue(q, memflags);
> +
This change causes new lockdep warning, see the report in:
https://lore.kernel.org/linux-block/692c17ca.a70a0220.d98e3.016c.GAE@google.com/
Thanks,
Ming
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-12-01 0:26 ` Ming Lei
@ 2025-12-01 4:43 ` Yu Kuai
2025-12-01 8:37 ` Ming Lei
0 siblings, 1 reply; 18+ messages in thread
From: Yu Kuai @ 2025-12-01 4:43 UTC (permalink / raw)
To: Ming Lei
Cc: syzbot ci, axboe, bvanassche, linux-block, nilay, tj, syzbot,
syzkaller-bugs, Yu Kuai
Hi,
在 2025/12/1 8:26, Ming Lei 写道:
> On Mon, Dec 01, 2025 at 03:50:22AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/11/30 18:09, syzbot ci 写道:
>>> syzbot ci has tested the following series
>>>
>>> [v3] blk-mq: fix possible deadlocks
>>> https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
>>> * [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
>>> * [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
>>> * [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
>>> * [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
>>> * [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
>>> * [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
>>> * [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
>>> * [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
>>> * [PATCH v3 09/10] blk-throttle: remove useless queue frozen
>>> * [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
>>>
>>> and found the following issue:
>>> possible deadlock in pcpu_alloc_noprof
>>>
>>> Full report is available here:
>>> https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
>>>
>>> ***
>>>
>>> possible deadlock in pcpu_alloc_noprof
>>>
>>> tree: linux-next
>>> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
>>> base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
>>> arch: amd64
>>> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
>>> config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
>>>
>>> soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
>>> ======================================================
>>> WARNING: possible circular locking dependency detected
>>> syzkaller #0 Not tainted
>>> ------------------------------------------------------
>>> syz-executor/6047 is trying to acquire lock:
>>> ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
>>>
>>> but task is already holding lock:
>>> ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
>>>
>>> which lock already depends on the new lock.
>>>
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
>>> __mutex_lock+0x187/0x1350
>>> pcpu_alloc_noprof+0x25b/0x1750
>>> blk_stat_alloc_callback+0xd5/0x220
>>> wbt_init+0xa3/0x500
>>> wbt_enable_default+0x25d/0x350
>>> blk_register_queue+0x36a/0x3f0
>>> __add_disk+0x677/0xd50
>>> add_disk_fwnode+0xfc/0x480
>>> loop_add+0x7f0/0xad0
>>> loop_init+0xd9/0x170
>>> do_one_initcall+0x1fb/0x820
>>> do_initcall_level+0x104/0x190
>>> do_initcalls+0x59/0xa0
>>> kernel_init_freeable+0x334/0x4b0
>>> kernel_init+0x1d/0x1d0
>>> ret_from_fork+0x599/0xb30
>>> ret_from_fork_asm+0x1a/0x30
>>>
>>> -> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
>>> blk_alloc_queue+0x538/0x620
>>> __blk_mq_alloc_disk+0x15c/0x340
>>> loop_add+0x411/0xad0
>>> loop_init+0xd9/0x170
>>> do_one_initcall+0x1fb/0x820
>>> do_initcall_level+0x104/0x190
>>> do_initcalls+0x59/0xa0
>>> kernel_init_freeable+0x334/0x4b0
>>> kernel_init+0x1d/0x1d0
>>> ret_from_fork+0x599/0xb30
>>> ret_from_fork_asm+0x1a/0x30
>>>
>>> -> #0 (fs_reclaim){+.+.}-{0:0}:
>>> __lock_acquire+0x15a6/0x2cf0
>>> lock_acquire+0x117/0x340
>>> fs_reclaim_acquire+0x72/0x100
>>> prepare_alloc_pages+0x152/0x650
>>> __alloc_frozen_pages_noprof+0x123/0x370
>>> __alloc_pages_noprof+0xa/0x30
>>> pcpu_populate_chunk+0x182/0xb30
>>> pcpu_alloc_noprof+0xcb6/0x1750
>>> xt_percpu_counter_alloc+0x161/0x220
>>> translate_table+0x1323/0x2040
>>> ip6t_register_table+0x106/0x7d0
>>> ip6table_nat_table_init+0x43/0x2e0
>>> xt_find_table_lock+0x30c/0x3e0
>>> xt_request_find_table_lock+0x26/0x100
>>> do_ip6t_get_ctl+0x730/0x1180
>>> nf_getsockopt+0x26e/0x290
>>> ipv6_getsockopt+0x1ed/0x290
>>> do_sock_getsockopt+0x2b4/0x3d0
>>> __x64_sys_getsockopt+0x1a5/0x250
>>> do_syscall_64+0xfa/0xf80
>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>
>>> other info that might help us debug this:
>>>
>>> Chain exists of:
>>> fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
>>>
>>> Possible unsafe locking scenario:
>>>
>>> CPU0 CPU1
>>> ---- ----
>>> lock(pcpu_alloc_mutex);
>>> lock(&q->q_usage_counter(io)#17);
>>> lock(pcpu_alloc_mutex);
>>> lock(fs_reclaim);
>> This does not look like introduced by this set, wbt_init() will hold
>> pcpu_alloc_mutex, and it can be called with queue frozen without this
>> set.
> It is introduced by your patch 6 in which blk_mq_freeze_queue() is added
> before calling wb_init() from wbt_enable_default(), then the warning is
> triggered.
Yes, I know this, I mean it's the same before this set from queue_wb_lat_store(),
where freeze queue is already called before wbt_init(), and this is not a new
problem.
>
>
> Thanks,
> Ming
>
>
--
Thanks,
Kuai
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-12-01 4:43 ` Yu Kuai
@ 2025-12-01 8:37 ` Ming Lei
2025-12-01 8:41 ` Yu Kuai
0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2025-12-01 8:37 UTC (permalink / raw)
To: Yu Kuai
Cc: syzbot ci, axboe, bvanassche, linux-block, nilay, tj, syzbot,
syzkaller-bugs
On Mon, Dec 01, 2025 at 12:43:03PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2025/12/1 8:26, Ming Lei 写道:
> > On Mon, Dec 01, 2025 at 03:50:22AM +0800, Yu Kuai wrote:
> >> Hi,
> >>
> >> 在 2025/11/30 18:09, syzbot ci 写道:
> >>> syzbot ci has tested the following series
> >>>
> >>> [v3] blk-mq: fix possible deadlocks
> >>> https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
> >>> * [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
> >>> * [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
> >>> * [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
> >>> * [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
> >>> * [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
> >>> * [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
> >>> * [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
> >>> * [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
> >>> * [PATCH v3 09/10] blk-throttle: remove useless queue frozen
> >>> * [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
> >>>
> >>> and found the following issue:
> >>> possible deadlock in pcpu_alloc_noprof
> >>>
> >>> Full report is available here:
> >>> https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
> >>>
> >>> ***
> >>>
> >>> possible deadlock in pcpu_alloc_noprof
> >>>
> >>> tree: linux-next
> >>> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> >>> base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
> >>> arch: amd64
> >>> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> >>> config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
> >>>
> >>> soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
> >>> ======================================================
> >>> WARNING: possible circular locking dependency detected
> >>> syzkaller #0 Not tainted
> >>> ------------------------------------------------------
> >>> syz-executor/6047 is trying to acquire lock:
> >>> ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
> >>>
> >>> but task is already holding lock:
> >>> ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
> >>>
> >>> which lock already depends on the new lock.
> >>>
> >>>
> >>> the existing dependency chain (in reverse order) is:
> >>>
> >>> -> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
> >>> __mutex_lock+0x187/0x1350
> >>> pcpu_alloc_noprof+0x25b/0x1750
> >>> blk_stat_alloc_callback+0xd5/0x220
> >>> wbt_init+0xa3/0x500
> >>> wbt_enable_default+0x25d/0x350
> >>> blk_register_queue+0x36a/0x3f0
> >>> __add_disk+0x677/0xd50
> >>> add_disk_fwnode+0xfc/0x480
> >>> loop_add+0x7f0/0xad0
> >>> loop_init+0xd9/0x170
> >>> do_one_initcall+0x1fb/0x820
> >>> do_initcall_level+0x104/0x190
> >>> do_initcalls+0x59/0xa0
> >>> kernel_init_freeable+0x334/0x4b0
> >>> kernel_init+0x1d/0x1d0
> >>> ret_from_fork+0x599/0xb30
> >>> ret_from_fork_asm+0x1a/0x30
> >>>
> >>> -> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
> >>> blk_alloc_queue+0x538/0x620
> >>> __blk_mq_alloc_disk+0x15c/0x340
> >>> loop_add+0x411/0xad0
> >>> loop_init+0xd9/0x170
> >>> do_one_initcall+0x1fb/0x820
> >>> do_initcall_level+0x104/0x190
> >>> do_initcalls+0x59/0xa0
> >>> kernel_init_freeable+0x334/0x4b0
> >>> kernel_init+0x1d/0x1d0
> >>> ret_from_fork+0x599/0xb30
> >>> ret_from_fork_asm+0x1a/0x30
> >>>
> >>> -> #0 (fs_reclaim){+.+.}-{0:0}:
> >>> __lock_acquire+0x15a6/0x2cf0
> >>> lock_acquire+0x117/0x340
> >>> fs_reclaim_acquire+0x72/0x100
> >>> prepare_alloc_pages+0x152/0x650
> >>> __alloc_frozen_pages_noprof+0x123/0x370
> >>> __alloc_pages_noprof+0xa/0x30
> >>> pcpu_populate_chunk+0x182/0xb30
> >>> pcpu_alloc_noprof+0xcb6/0x1750
> >>> xt_percpu_counter_alloc+0x161/0x220
> >>> translate_table+0x1323/0x2040
> >>> ip6t_register_table+0x106/0x7d0
> >>> ip6table_nat_table_init+0x43/0x2e0
> >>> xt_find_table_lock+0x30c/0x3e0
> >>> xt_request_find_table_lock+0x26/0x100
> >>> do_ip6t_get_ctl+0x730/0x1180
> >>> nf_getsockopt+0x26e/0x290
> >>> ipv6_getsockopt+0x1ed/0x290
> >>> do_sock_getsockopt+0x2b4/0x3d0
> >>> __x64_sys_getsockopt+0x1a5/0x250
> >>> do_syscall_64+0xfa/0xf80
> >>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >>>
> >>> other info that might help us debug this:
> >>>
> >>> Chain exists of:
> >>> fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
> >>>
> >>> Possible unsafe locking scenario:
> >>>
> >>> CPU0 CPU1
> >>> ---- ----
> >>> lock(pcpu_alloc_mutex);
> >>> lock(&q->q_usage_counter(io)#17);
> >>> lock(pcpu_alloc_mutex);
> >>> lock(fs_reclaim);
> >> This does not look like introduced by this set, wbt_init() will hold
> >> pcpu_alloc_mutex, and it can be called with queue frozen without this
> >> set.
> > It is introduced by your patch 6 in which blk_mq_freeze_queue() is added
> > before calling wb_init() from wbt_enable_default(), then the warning is
> > triggered.
>
> Yes, I know this, I mean it's the same before this set from queue_wb_lat_store(),
> where freeze queue is already called before wbt_init(), and this is not a new
> problem.
The point is that wb_init() won't be called any more from sysfs if it is
done in blk_register_queue(), which is the default setting.
Thanks,
Ming
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot ci] Re: blk-mq: fix possible deadlocks
2025-12-01 8:37 ` Ming Lei
@ 2025-12-01 8:41 ` Yu Kuai
0 siblings, 0 replies; 18+ messages in thread
From: Yu Kuai @ 2025-12-01 8:41 UTC (permalink / raw)
To: Ming Lei
Cc: syzbot ci, axboe, bvanassche, linux-block, nilay, tj, syzbot,
syzkaller-bugs, Yu Kuai
Hi,
在 2025/12/1 16:37, Ming Lei 写道:
> On Mon, Dec 01, 2025 at 12:43:03PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/12/1 8:26, Ming Lei 写道:
>>> On Mon, Dec 01, 2025 at 03:50:22AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2025/11/30 18:09, syzbot ci 写道:
>>>>> syzbot ci has tested the following series
>>>>>
>>>>> [v3] blk-mq: fix possible deadlocks
>>>>> https://lore.kernel.org/all/20251130024349.2302128-1-yukuai@fnnas.com
>>>>> * [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
>>>>> * [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock
>>>>> * [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
>>>>> * [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock
>>>>> * [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen()
>>>>> * [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
>>>>> * [PATCH v3 07/10] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
>>>>> * [PATCH v3 08/10] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
>>>>> * [PATCH v3 09/10] blk-throttle: remove useless queue frozen
>>>>> * [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add()
>>>>>
>>>>> and found the following issue:
>>>>> possible deadlock in pcpu_alloc_noprof
>>>>>
>>>>> Full report is available here:
>>>>> https://ci.syzbot.org/series/1aec77f0-c53f-4b3b-93fb-b3853983b6bd
>>>>>
>>>>> ***
>>>>>
>>>>> possible deadlock in pcpu_alloc_noprof
>>>>>
>>>>> tree: linux-next
>>>>> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
>>>>> base: 7d31f578f3230f3b7b33b0930b08f9afd8429817
>>>>> arch: amd64
>>>>> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
>>>>> config: https://ci.syzbot.org/builds/70dca9e4-6667-4930-9024-150d656e503e/config
>>>>>
>>>>> soft_limit_in_bytes is deprecated and will be removed. Please report your usecase to linux-mm@kvack.org if you depend on this functionality.
>>>>> ======================================================
>>>>> WARNING: possible circular locking dependency detected
>>>>> syzkaller #0 Not tainted
>>>>> ------------------------------------------------------
>>>>> syz-executor/6047 is trying to acquire lock:
>>>>> ffffffff8e04f760 (fs_reclaim){+.+.}-{0:0}, at: prepare_alloc_pages+0x152/0x650
>>>>>
>>>>> but task is already holding lock:
>>>>> ffffffff8e02dde8 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x25b/0x1750
>>>>>
>>>>> which lock already depends on the new lock.
>>>>>
>>>>>
>>>>> the existing dependency chain (in reverse order) is:
>>>>>
>>>>> -> #2 (pcpu_alloc_mutex){+.+.}-{4:4}:
>>>>> __mutex_lock+0x187/0x1350
>>>>> pcpu_alloc_noprof+0x25b/0x1750
>>>>> blk_stat_alloc_callback+0xd5/0x220
>>>>> wbt_init+0xa3/0x500
>>>>> wbt_enable_default+0x25d/0x350
>>>>> blk_register_queue+0x36a/0x3f0
>>>>> __add_disk+0x677/0xd50
>>>>> add_disk_fwnode+0xfc/0x480
>>>>> loop_add+0x7f0/0xad0
>>>>> loop_init+0xd9/0x170
>>>>> do_one_initcall+0x1fb/0x820
>>>>> do_initcall_level+0x104/0x190
>>>>> do_initcalls+0x59/0xa0
>>>>> kernel_init_freeable+0x334/0x4b0
>>>>> kernel_init+0x1d/0x1d0
>>>>> ret_from_fork+0x599/0xb30
>>>>> ret_from_fork_asm+0x1a/0x30
>>>>>
>>>>> -> #1 (&q->q_usage_counter(io)#17){++++}-{0:0}:
>>>>> blk_alloc_queue+0x538/0x620
>>>>> __blk_mq_alloc_disk+0x15c/0x340
>>>>> loop_add+0x411/0xad0
>>>>> loop_init+0xd9/0x170
>>>>> do_one_initcall+0x1fb/0x820
>>>>> do_initcall_level+0x104/0x190
>>>>> do_initcalls+0x59/0xa0
>>>>> kernel_init_freeable+0x334/0x4b0
>>>>> kernel_init+0x1d/0x1d0
>>>>> ret_from_fork+0x599/0xb30
>>>>> ret_from_fork_asm+0x1a/0x30
>>>>>
>>>>> -> #0 (fs_reclaim){+.+.}-{0:0}:
>>>>> __lock_acquire+0x15a6/0x2cf0
>>>>> lock_acquire+0x117/0x340
>>>>> fs_reclaim_acquire+0x72/0x100
>>>>> prepare_alloc_pages+0x152/0x650
>>>>> __alloc_frozen_pages_noprof+0x123/0x370
>>>>> __alloc_pages_noprof+0xa/0x30
>>>>> pcpu_populate_chunk+0x182/0xb30
>>>>> pcpu_alloc_noprof+0xcb6/0x1750
>>>>> xt_percpu_counter_alloc+0x161/0x220
>>>>> translate_table+0x1323/0x2040
>>>>> ip6t_register_table+0x106/0x7d0
>>>>> ip6table_nat_table_init+0x43/0x2e0
>>>>> xt_find_table_lock+0x30c/0x3e0
>>>>> xt_request_find_table_lock+0x26/0x100
>>>>> do_ip6t_get_ctl+0x730/0x1180
>>>>> nf_getsockopt+0x26e/0x290
>>>>> ipv6_getsockopt+0x1ed/0x290
>>>>> do_sock_getsockopt+0x2b4/0x3d0
>>>>> __x64_sys_getsockopt+0x1a5/0x250
>>>>> do_syscall_64+0xfa/0xf80
>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>
>>>>> other info that might help us debug this:
>>>>>
>>>>> Chain exists of:
>>>>> fs_reclaim --> &q->q_usage_counter(io)#17 --> pcpu_alloc_mutex
>>>>>
>>>>> Possible unsafe locking scenario:
>>>>>
>>>>> CPU0 CPU1
>>>>> ---- ----
>>>>> lock(pcpu_alloc_mutex);
>>>>> lock(&q->q_usage_counter(io)#17);
>>>>> lock(pcpu_alloc_mutex);
>>>>> lock(fs_reclaim);
>>>> This does not look like introduced by this set, wbt_init() will hold
>>>> pcpu_alloc_mutex, and it can be called with queue frozen without this
>>>> set.
>>> It is introduced by your patch 6 in which blk_mq_freeze_queue() is added
>>> before calling wb_init() from wbt_enable_default(), then the warning is
>>> triggered.
>> Yes, I know this, I mean it's the same before this set from queue_wb_lat_store(),
>> where freeze queue is already called before wbt_init(), and this is not a new
>> problem.
> The point is that wb_init() won't be called any more from sysfs if it is
> done in blk_register_queue(), which is the default setting.
That is the case if wbt is enabled by default, however, we have a config option,
see CONFIG_BLK_WBT_MQ, to disable wbt by default, and in this case wbt_init()
can be called by sysfs.
Anyway, I already sent a new version to fix this. :)
>
> Thanks,
> Ming
>
>
--
Thanks,
Kuai
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-12-01 8:41 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-30 2:43 [PATCH v3 00/10] blk-mq: fix possible deadlocks Yu Kuai
2025-11-30 2:43 ` [PATCH v3 01/10] blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos Yu Kuai
2025-11-30 2:43 ` [PATCH v3 02/10] blk-rq-qos: fix possible debugfs_mutex deadlock Yu Kuai
2025-11-30 2:43 ` [PATCH v3 03/10] blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static Yu Kuai
2025-11-30 2:43 ` [PATCH v3 04/10] blk-mq-debugfs: warn about possible deadlock Yu Kuai
2025-11-30 2:43 ` [PATCH v3 05/10] block/blk-rq-qos: add a new helper rq_qos_add_frozen() Yu Kuai
2025-11-30 2:43 ` [PATCH v3 06/10] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
2025-12-01 0:29 ` Ming Lei
2025-11-30 2:43 ` [PATCH v3 07/10] blk-iocost: " Yu Kuai
2025-11-30 2:43 ` [PATCH v3 08/10] blk-iolatency: " Yu Kuai
2025-11-30 2:43 ` [PATCH v3 09/10] blk-throttle: remove useless queue frozen Yu Kuai
2025-11-30 2:43 ` [PATCH v3 10/10] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
2025-11-30 10:09 ` [syzbot ci] Re: blk-mq: fix possible deadlocks syzbot ci
2025-11-30 19:50 ` Yu Kuai
2025-12-01 0:26 ` Ming Lei
2025-12-01 4:43 ` Yu Kuai
2025-12-01 8:37 ` Ming Lei
2025-12-01 8:41 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox