From: Yu Kuai <yukuai@fnnas.com>
To: Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
Jens Axboe <axboe@kernel.dk>
Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org,
Zheng Qixing <zhengqixing@huawei.com>,
Ming Lei <ming.lei@redhat.com>,
Nilay Shroff <nilay@linux.ibm.com>
Subject: [PATCH v3 7/7] blk-rq-qos: move rq_qos_mutex acquisition inside rq_qos_add/del
Date: Wed, 4 Mar 2026 15:38:08 +0800 [thread overview]
Message-ID: <20260304073809.3438679-8-yukuai@fnnas.com> (raw)
In-Reply-To: <20260304073809.3438679-1-yukuai@fnnas.com>
The current rq_qos_mutex handling has an awkward pattern where callers
must acquire the mutex before calling rq_qos_add()/rq_qos_del(), and
blkg_conf_open_bdev_frozen() had to release and re-acquire the mutex
around queue freezing to maintain proper locking order (freeze queue
before mutex).
On the other hand, with rq_qos_mutex held after blkg_conf_prep(), there
are many possible deadlocks:
- allocating memory with GFP_KERNEL, like blk_throtl_init();
- allocating percpu memory, like pd_alloc_fn() for iocost/iolatency;
This patch refactors the locking by:
1. Moving queue freeze and rq_qos_mutex acquisition inside
rq_qos_add()/rq_qos_del(), with the correct order: freeze first,
then acquire mutex.
2. Removing external mutex handling from wbt_init() since rq_qos_add()
now handles it internally.
3. Removing rq_qos_mutex handling from blkg_conf_open_bdev() entirely,
making it only responsible for parsing MAJ:MIN and opening the bdev.
4. Removing blkg_conf_open_bdev_frozen() and blkg_conf_exit_frozen()
functions which are no longer needed.
5. Updating ioc_qos_write() to use the simpler blkg_conf_open_bdev()
and blkg_conf_exit() functions.
This eliminates the release-and-reacquire pattern and makes
rq_qos_add()/rq_qos_del() self-contained, which is cleaner and reduces
complexity. Each function now properly manages its own locking with
the correct order: queue freeze → mutex acquire → modify → mutex
release → queue unfreeze.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-cgroup.c | 50 -------------------------------------------
block/blk-cgroup.h | 2 --
block/blk-iocost.c | 11 ++++------
block/blk-iolatency.c | 5 -----
block/blk-rq-qos.c | 31 ++++++++++++++++-----------
block/blk-wbt.c | 2 --
6 files changed, 22 insertions(+), 79 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1620be75f124..02ef8f60f759 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -802,10 +802,8 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx)
return -ENODEV;
}
- mutex_lock(&bdev->bd_queue->rq_qos_mutex);
if (!disk_live(bdev->bd_disk)) {
blkdev_put_no_open(bdev);
- mutex_unlock(&bdev->bd_queue->rq_qos_mutex);
return -ENODEV;
}
@@ -813,38 +811,6 @@ int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx)
ctx->bdev = bdev;
return 0;
}
-/*
- * Similar to blkg_conf_open_bdev, but additionally freezes the queue,
- * ensures the correct locking order between freeze queue and q->rq_qos_mutex.
- *
- * This function returns negative error on failure. On success it returns
- * memflags which must be saved and later passed to blkg_conf_exit_frozen
- * for restoring the memalloc scope.
- */
-unsigned long __must_check blkg_conf_open_bdev_frozen(struct blkg_conf_ctx *ctx)
-{
- int ret;
- unsigned long memflags;
-
- if (ctx->bdev)
- return -EINVAL;
-
- ret = blkg_conf_open_bdev(ctx);
- if (ret < 0)
- return ret;
- /*
- * At this point, we haven’t started protecting anything related to QoS,
- * so we release q->rq_qos_mutex here, which was first acquired in blkg_
- * conf_open_bdev. Later, we re-acquire q->rq_qos_mutex after freezing
- * the queue to maintain the correct locking order.
- */
- mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex);
-
- memflags = blk_mq_freeze_queue(ctx->bdev->bd_queue);
- mutex_lock(&ctx->bdev->bd_queue->rq_qos_mutex);
-
- return memflags;
-}
/**
* blkg_conf_prep - parse and prepare for per-blkg config update
@@ -978,7 +944,6 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
*/
void blkg_conf_exit(struct blkg_conf_ctx *ctx)
__releases(&ctx->bdev->bd_queue->queue_lock)
- __releases(&ctx->bdev->bd_queue->rq_qos_mutex)
{
if (ctx->blkg) {
spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
@@ -986,7 +951,6 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx)
}
if (ctx->bdev) {
- mutex_unlock(&ctx->bdev->bd_queue->rq_qos_mutex);
blkdev_put_no_open(ctx->bdev);
ctx->body = NULL;
ctx->bdev = NULL;
@@ -994,20 +958,6 @@ void blkg_conf_exit(struct blkg_conf_ctx *ctx)
}
EXPORT_SYMBOL_GPL(blkg_conf_exit);
-/*
- * Similar to blkg_conf_exit, but also unfreezes the queue. Should be used
- * when blkg_conf_open_bdev_frozen is used to open the bdev.
- */
-void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memflags)
-{
- if (ctx->bdev) {
- struct request_queue *q = ctx->bdev->bd_queue;
-
- blkg_conf_exit(ctx);
- blk_mq_unfreeze_queue(q, memflags);
- }
-}
-
static void blkg_iostat_add(struct blkg_iostat *dst, struct blkg_iostat *src)
{
int i;
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 1cce3294634d..d4e7f78ba545 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -219,11 +219,9 @@ struct blkg_conf_ctx {
void blkg_conf_init(struct blkg_conf_ctx *ctx, char *input);
int blkg_conf_open_bdev(struct blkg_conf_ctx *ctx);
-unsigned long blkg_conf_open_bdev_frozen(struct blkg_conf_ctx *ctx);
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
struct blkg_conf_ctx *ctx);
void blkg_conf_exit(struct blkg_conf_ctx *ctx);
-void blkg_conf_exit_frozen(struct blkg_conf_ctx *ctx, unsigned long memflags);
/**
* bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index ef543d163d46..104a9a9f563f 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3220,16 +3220,13 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
u32 qos[NR_QOS_PARAMS];
bool enable, user;
char *body, *p;
- unsigned long memflags;
int ret;
blkg_conf_init(&ctx, input);
- memflags = blkg_conf_open_bdev_frozen(&ctx);
- if (IS_ERR_VALUE(memflags)) {
- ret = memflags;
+ ret = blkg_conf_open_bdev(&ctx);
+ if (ret)
goto err;
- }
body = ctx.body;
disk = ctx.bdev->bd_disk;
@@ -3346,14 +3343,14 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
blk_mq_unquiesce_queue(disk->queue);
- blkg_conf_exit_frozen(&ctx, memflags);
+ blkg_conf_exit(&ctx);
return nbytes;
einval:
spin_unlock_irq(&ioc->lock);
blk_mq_unquiesce_queue(disk->queue);
ret = -EINVAL;
err:
- blkg_conf_exit_frozen(&ctx, memflags);
+ blkg_conf_exit(&ctx);
return ret;
}
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index f7434278cd29..3f454fb3ff51 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -842,11 +842,6 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
if (ret)
goto out;
- /*
- * blk_iolatency_init() may fail after rq_qos_add() succeeds which can
- * confuse iolat_rq_qos() test. Make the test and init atomic.
- */
- lockdep_assert_held(&ctx.bdev->bd_queue->rq_qos_mutex);
if (!iolat_rq_qos(ctx.bdev->bd_queue))
ret = blk_iolatency_init(ctx.bdev->bd_disk);
if (ret)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 85cf74402a09..fe96183bcc75 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -327,8 +327,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
{
struct request_queue *q = disk->queue;
unsigned int memflags;
-
- lockdep_assert_held(&q->rq_qos_mutex);
+ int ret = 0;
rqos->disk = disk;
rqos->id = id;
@@ -337,20 +336,24 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
/*
* No IO can be in-flight when adding rqos, so freeze queue, which
* is fine since we only support rq_qos for blk-mq queue.
+ *
+ * Acquire rq_qos_mutex after freezing the queue to ensure proper
+ * locking order.
*/
memflags = blk_mq_freeze_queue(q);
+ mutex_lock(&q->rq_qos_mutex);
- if (rq_qos_id(q, rqos->id))
- goto ebusy;
- rqos->next = q->rq_qos;
- q->rq_qos = rqos;
- blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
+ if (rq_qos_id(q, rqos->id)) {
+ ret = -EBUSY;
+ } else {
+ rqos->next = q->rq_qos;
+ q->rq_qos = rqos;
+ blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
+ }
+ mutex_unlock(&q->rq_qos_mutex);
blk_mq_unfreeze_queue(q, memflags);
- return 0;
-ebusy:
- blk_mq_unfreeze_queue(q, memflags);
- return -EBUSY;
+ return ret;
}
void rq_qos_del(struct rq_qos *rqos)
@@ -359,9 +362,9 @@ void rq_qos_del(struct rq_qos *rqos)
struct rq_qos **cur;
unsigned int memflags;
- lockdep_assert_held(&q->rq_qos_mutex);
-
memflags = blk_mq_freeze_queue(q);
+ mutex_lock(&q->rq_qos_mutex);
+
for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
if (*cur == rqos) {
*cur = rqos->next;
@@ -370,5 +373,7 @@ void rq_qos_del(struct rq_qos *rqos)
}
if (!q->rq_qos)
blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
+
+ mutex_unlock(&q->rq_qos_mutex);
blk_mq_unfreeze_queue(q, memflags);
}
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 6dba71e87387..dde03b9ea074 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -961,9 +961,7 @@ static int wbt_init(struct gendisk *disk, struct rq_wb *rwb)
/*
* Assign rwb and add the stats callback.
*/
- mutex_lock(&q->rq_qos_mutex);
ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
- mutex_unlock(&q->rq_qos_mutex);
if (ret)
return ret;
--
2.51.0
next prev parent reply other threads:[~2026-03-04 7:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 7:38 [PATCH v3 0/7] blk-cgroup: fix races related to blkg_list iteration Yu Kuai
2026-03-04 7:38 ` [PATCH v3 1/7] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-03-04 7:38 ` [PATCH v3 2/7] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
2026-03-04 7:38 ` [PATCH v3 3/7] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
2026-03-04 7:38 ` [PATCH v3 4/7] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
2026-03-04 7:38 ` [PATCH v3 5/7] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
2026-03-04 7:38 ` [PATCH v3 6/7] blk-cgroup: allocate pds before freezing queue in blkcg_activate_policy() Yu Kuai
2026-03-04 7:38 ` Yu Kuai [this message]
2026-03-23 8:11 ` [PATCH v3 0/7] blk-cgroup: fix races related to blkg_list iteration Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260304073809.3438679-8-yukuai@fnnas.com \
--to=yukuai@fnnas.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=nilay@linux.ibm.com \
--cc=tj@kernel.org \
--cc=zhengqixing@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox