From: Yu Kuai <yukuai@kernel.org>
To: Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
Jens Axboe <axboe@kernel.dk>
Cc: Zheng Qixing <zhengqixing@huawei.com>,
Christoph Hellwig <hch@lst.de>,
Tang Yizhou <yizhou.tang@shopee.com>,
Nilay Shroff <nilay@linux.ibm.com>,
Ming Lei <ming.lei@redhat.com>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction
Date: Wed, 24 Jun 2026 14:46:23 +0800 [thread overview]
Message-ID: <20260624064625.1743650-5-yukuai@kernel.org> (raw)
In-Reply-To: <20260624064625.1743650-1-yukuai@kernel.org>
From: Zheng Qixing <zhengqixing@huawei.com>
When switching an IO scheduler on a block device, blkcg_activate_policy()
allocates blkg_policy_data (pd) for all blkgs attached to the queue.
However, blkcg_activate_policy() may race with concurrent blkcg deletion,
leading to use-after-free and memory leak issues.
The use-after-free occurs in the following race:
T1 (blkcg_activate_policy):
- Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
- Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
- Enters the enomem rollback path to release blkg1 resources
T2 (blkcg deletion):
- blkcgA is deleted concurrently
- blkg1 is freed via blkg_free_workfn()
- blkg1->pd is freed
T1 (continued):
- Rollback path accesses blkg1->pd->online after pd is freed
- Triggers use-after-free
In addition, blkg_free_workfn() frees pd before removing the blkg from
q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd
for a blkg that is being destroyed, leaving the newly allocated pd
unreachable when the blkg is finally freed.
Fix these races by extending blkcg_mutex coverage to serialize
blkcg_activate_policy() rollback and blkg destruction, ensuring pd
lifecycle is synchronized with blkg list visibility.
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/blk-cgroup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 7baccfb690fe..f7e788a7fe95 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1563,10 +1563,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
return -EINVAL;
if (queue_is_mq(q))
memflags = blk_mq_freeze_queue(q);
+
+ mutex_lock(&q->blkcg_mutex);
retry:
spin_lock_irq(&q->queue_lock);
/* blkg_list is pushed at the head, reverse walk to initialize parents first */
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
@@ -1625,10 +1627,11 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
__set_bit(pol->plid, q->blkcg_pols);
ret = 0;
spin_unlock_irq(&q->queue_lock);
out:
+ mutex_unlock(&q->blkcg_mutex);
if (queue_is_mq(q))
blk_mq_unfreeze_queue(q, memflags);
if (pinned_blkg)
blkg_put(pinned_blkg);
if (pd_prealloc)
--
2.51.0
next prev parent reply other threads:[~2026-06-24 6:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 6:46 [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups Yu Kuai
2026-06-24 6:46 ` [PATCH 2/2] ext4: add unraid mount option for single-disk-per-group mode Yu Kuai
2026-06-24 6:46 ` [PATCH v2 0/4] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-24 6:46 ` [PATCH v2 1/4] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-06-24 6:46 ` Yu Kuai [this message]
2026-06-25 15:08 ` [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction Nilay Shroff
2026-06-26 1:50 ` yu kuai
2026-06-26 1:52 ` yu kuai
2026-06-26 6:12 ` Nilay Shroff
2026-06-27 4:13 ` yu kuai
2026-06-24 6:46 ` [PATCH v2 3/4] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
2026-06-24 6:46 ` [PATCH v2 4/4] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
2026-06-24 6:55 ` [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups yu kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260624064625.1743650-5-yukuai@kernel.org \
--to=yukuai@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=nilay@linux.ibm.com \
--cc=tj@kernel.org \
--cc=yizhou.tang@shopee.com \
--cc=zhengqixing@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox