* [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
blkg_destroy_all() iterates q->blkg_list without holding blkcg_mutex,
which can race with blkg_free_workfn() that removes blkgs from the list
while holding blkcg_mutex.
Add blkcg_mutex protection around the q->blkg_list iteration to prevent
potential list corruption or use-after-free issues.
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/blk-cgroup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 554c87bb4a86..a98a22e06fd1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -573,10 +573,11 @@ static void blkg_destroy_all(struct gendisk *disk)
struct blkcg_gq *blkg;
int count = BLKG_DESTROY_BATCH_SIZE;
int i;
restart:
+ mutex_lock(&q->blkcg_mutex);
spin_lock_irq(&q->queue_lock);
list_for_each_entry(blkg, &q->blkg_list, q_node) {
struct blkcg *blkcg = blkg->blkcg;
if (hlist_unhashed(&blkg->blkcg_node))
@@ -591,10 +592,11 @@ static void blkg_destroy_all(struct gendisk *disk)
* it when a batch of blkgs are destroyed.
*/
if (!(--count)) {
count = BLKG_DESTROY_BATCH_SIZE;
spin_unlock_irq(&q->queue_lock);
+ mutex_unlock(&q->blkcg_mutex);
cond_resched();
goto restart;
}
}
@@ -610,10 +612,11 @@ static void blkg_destroy_all(struct gendisk *disk)
__clear_bit(pol->plid, q->blkcg_pols);
}
q->root_blkg = NULL;
spin_unlock_irq(&q->queue_lock);
+ mutex_unlock(&q->blkcg_mutex);
wake_up_var(&q->root_blkg);
}
static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
2026-06-04 17:31 ` Nilay Shroff
2026-06-03 13:27 ` [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
but not blkcg_mutex. This can race with blkg_free_workfn() that removes
blkgs from the list while holding blkcg_mutex.
Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
ensure proper synchronization when iterating q->blkg_list.
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/bfq-cgroup.c | 3 ++-
block/bfq-iosched.c | 6 ++++++
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 37ab70930c8d..f765e767d36a 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
struct blkcg_gq *blkg;
list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
struct bfq_group *bfqg = blkg_to_bfqg(blkg);
- bfq_end_wr_async_queues(bfqd, bfqg);
+ if (bfqg)
+ bfq_end_wr_async_queues(bfqd, bfqg);
}
bfq_end_wr_async_queues(bfqd, bfqd->root_group);
}
static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 141c602d5e85..42ccfd0c6140 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
static void bfq_end_wr(struct bfq_data *bfqd)
{
struct bfq_queue *bfqq;
int i;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+ mutex_lock(&bfqd->queue->blkcg_mutex);
+#endif
spin_lock_irq(&bfqd->lock);
for (i = 0; i < bfqd->num_actuators; i++) {
list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
bfq_bfqq_end_wr(bfqq);
@@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
bfq_bfqq_end_wr(bfqq);
bfq_end_wr_async(bfqd);
spin_unlock_irq(&bfqd->lock);
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+ mutex_unlock(&bfqd->queue->blkcg_mutex);
+#endif
}
static sector_t bfq_io_struct_pos(void *io_struct, bool request)
{
if (request)
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
@ 2026-06-04 17:31 ` Nilay Shroff
2026-06-08 2:48 ` yu kuai
0 siblings, 1 reply; 11+ messages in thread
From: Nilay Shroff @ 2026-06-04 17:31 UTC (permalink / raw)
To: Yu Kuai, Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Bart Van Assche, linux-block,
cgroups, linux-kernel
On 6/3/26 6:57 PM, Yu Kuai wrote:
> bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
> but not blkcg_mutex. This can race with blkg_free_workfn() that removes
> blkgs from the list while holding blkcg_mutex.
>
> Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
> ensure proper synchronization when iterating q->blkg_list.
>
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
> ---
> block/bfq-cgroup.c | 3 ++-
> block/bfq-iosched.c | 6 ++++++
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 37ab70930c8d..f765e767d36a 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
> struct blkcg_gq *blkg;
>
> list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
> struct bfq_group *bfqg = blkg_to_bfqg(blkg);
>
> - bfq_end_wr_async_queues(bfqd, bfqg);
> + if (bfqg)
> + bfq_end_wr_async_queues(bfqd, bfqg);
> }
> bfq_end_wr_async_queues(bfqd, bfqd->root_group);
> }
>
> static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 141c602d5e85..42ccfd0c6140 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
> static void bfq_end_wr(struct bfq_data *bfqd)
> {
> struct bfq_queue *bfqq;
> int i;
>
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> + mutex_lock(&bfqd->queue->blkcg_mutex);
> +#endif
> spin_lock_irq(&bfqd->lock);
>
> for (i = 0; i < bfqd->num_actuators; i++) {
> list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
> bfq_bfqq_end_wr(bfqq);
> @@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
> list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
> bfq_bfqq_end_wr(bfqq);
> bfq_end_wr_async(bfqd);
>
> spin_unlock_irq(&bfqd->lock);
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> + mutex_unlock(&bfqd->queue->blkcg_mutex);
> +#endif
> }
The above change protects the q->blkg_list iteration in bfq_end_wr_async()
against list removal in blkg_free_workfn(). However the blkg insertion in
blkg_create() still doesn't use q->blkcg_mutex and so list traversal in
bfq_end_wr_async() may still race with blkg_create().
So I think we may also need to protect blkg insert in blkg_create() using
q->blkcg_mutex.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
2026-06-04 17:31 ` Nilay Shroff
@ 2026-06-08 2:48 ` yu kuai
0 siblings, 0 replies; 11+ messages in thread
From: yu kuai @ 2026-06-08 2:48 UTC (permalink / raw)
To: Nilay Shroff, Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Bart Van Assche, linux-block,
cgroups, linux-kernel, yukuai
Hi,
在 2026/6/5 1:31, Nilay Shroff 写道:
> On 6/3/26 6:57 PM, Yu Kuai wrote:
>> bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
>> but not blkcg_mutex. This can race with blkg_free_workfn() that removes
>> blkgs from the list while holding blkcg_mutex.
>>
>> Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
>> ensure proper synchronization when iterating q->blkg_list.
>>
>> Signed-off-by: Yu Kuai <yukuai@fygo.io>
>> ---
>> block/bfq-cgroup.c | 3 ++-
>> block/bfq-iosched.c | 6 ++++++
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 37ab70930c8d..f765e767d36a 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
>> struct blkcg_gq *blkg;
>> list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
>> struct bfq_group *bfqg = blkg_to_bfqg(blkg);
>> - bfq_end_wr_async_queues(bfqd, bfqg);
>> + if (bfqg)
>> + bfq_end_wr_async_queues(bfqd, bfqg);
>> }
>> bfq_end_wr_async_queues(bfqd, bfqd->root_group);
>> }
>> static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 141c602d5e85..42ccfd0c6140 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data
>> *bfqd,
>> static void bfq_end_wr(struct bfq_data *bfqd)
>> {
>> struct bfq_queue *bfqq;
>> int i;
>> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> + mutex_lock(&bfqd->queue->blkcg_mutex);
>> +#endif
>> spin_lock_irq(&bfqd->lock);
>> for (i = 0; i < bfqd->num_actuators; i++) {
>> list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
>> bfq_bfqq_end_wr(bfqq);
>> @@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
>> list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
>> bfq_bfqq_end_wr(bfqq);
>> bfq_end_wr_async(bfqd);
>> spin_unlock_irq(&bfqd->lock);
>> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> + mutex_unlock(&bfqd->queue->blkcg_mutex);
>> +#endif
>> }
>
> The above change protects the q->blkg_list iteration in
> bfq_end_wr_async()
> against list removal in blkg_free_workfn(). However the blkg insertion in
> blkg_create() still doesn't use q->blkcg_mutex and so list traversal in
> bfq_end_wr_async() may still race with blkg_create().
>
> So I think we may also need to protect blkg insert in blkg_create() using
> q->blkcg_mutex.
Yes, this is done in another huge patchset, because currently blkg_create()
is protected by queue_lock and can be called under rcu, code refactor will
be required.
I'll send the first set soon.
>
> Thanks,
> --Nilay
--
Thanks,
Kuai
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
2026-06-03 13:27 ` [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
From: Zheng Qixing <zhengqixing@huawei.com>
When switching an IO scheduler on a block device, blkcg_activate_policy()
allocates blkg_policy_data (pd) for all blkgs attached to the queue.
However, blkcg_activate_policy() may race with concurrent blkcg deletion,
leading to use-after-free and memory leak issues.
The use-after-free occurs in the following race:
T1 (blkcg_activate_policy):
- Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
- Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
- Enters the enomem rollback path to release blkg1 resources
T2 (blkcg deletion):
- blkcgA is deleted concurrently
- blkg1 is freed via blkg_free_workfn()
- blkg1->pd is freed
T1 (continued):
- Rollback path accesses blkg1->pd->online after pd is freed
- Triggers use-after-free
In addition, blkg_free_workfn() frees pd before removing the blkg from
q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd
for a blkg that is being destroyed, leaving the newly allocated pd
unreachable when the blkg is finally freed.
Fix these races by extending blkcg_mutex coverage to serialize
blkcg_activate_policy() rollback and blkg destruction, ensuring pd
lifecycle is synchronized with blkg list visibility.
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/blk-cgroup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a98a22e06fd1..007dfc4f9c59 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1612,10 +1612,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
return -EINVAL;
if (queue_is_mq(q))
memflags = blk_mq_freeze_queue(q);
+
+ mutex_lock(&q->blkcg_mutex);
retry:
spin_lock_irq(&q->queue_lock);
/* blkg_list is pushed at the head, reverse walk to initialize parents first */
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
@@ -1674,10 +1676,11 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
__set_bit(pol->plid, q->blkcg_pols);
ret = 0;
spin_unlock_irq(&q->queue_lock);
out:
+ mutex_unlock(&q->blkcg_mutex);
if (queue_is_mq(q))
blk_mq_unfreeze_queue(q, memflags);
if (pinned_blkg)
blkg_put(pinned_blkg);
if (pd_prealloc)
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy()
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
` (2 preceding siblings ...)
2026-06-03 13:27 ` [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
2026-06-03 13:27 ` [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
From: Zheng Qixing <zhengqixing@huawei.com>
When switching IO schedulers on a block device, blkcg_activate_policy()
can race with concurrent blkcg deletion, leading to a use-after-free in
rcu_accelerate_cbs.
T1: T2:
blkg_destroy
kill(&blkg->refcnt) // blkg->refcnt=1->0
blkg_release // call_rcu(__blkg_release)
...
blkg_free_workfn
->pd_free_fn(pd)
elv_iosched_store
elevator_switch
...
iterate blkg list
blkg_get(blkg) // blkg->refcnt=0->1
list_del_init(&blkg->q_node)
blkg_put(pinned_blkg) // blkg->refcnt=1->0
blkg_release // call_rcu again
rcu_accelerate_cbs // uaf
Fix this by checking hlist_unhashed(&blkg->blkcg_node) before getting
a reference to the blkg. This is the same check used in blkg_destroy()
to detect if a blkg has already been destroyed. If the blkg is already
unhashed, skip processing it since it's being destroyed.
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/blk-cgroup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 007dfc4f9c59..b479a9796fb3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1623,10 +1623,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
struct blkg_policy_data *pd;
if (blkg->pd[pol->plid])
continue;
+ if (hlist_unhashed(&blkg->blkcg_node))
+ continue;
/* If prealloc matches, use it; otherwise try GFP_NOWAIT */
if (blkg == pinned_blkg) {
pd = pd_prealloc;
pd_prealloc = NULL;
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
` (3 preceding siblings ...)
2026-06-03 13:27 ` [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
2026-06-03 14:35 ` [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Jens Axboe
2026-06-09 15:15 ` Yizhou Tang
6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
From: Zheng Qixing <zhengqixing@huawei.com>
Move the teardown sequence which offlines and frees per-policy
blkg_policy_data (pd) into a helper for readability.
No functional change intended.
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yu Kuai <yukuai@fygo.io>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
block/blk-cgroup.c | 57 ++++++++++++++++++++++------------------------
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b479a9796fb3..c75b2a103bbc 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1575,10 +1575,35 @@ struct cgroup_subsys io_cgrp_subsys = {
.depends_on = 1 << memory_cgrp_id,
#endif
};
EXPORT_SYMBOL_GPL(io_cgrp_subsys);
+/*
+ * Tear down per-blkg policy data for @pol on @q.
+ */
+static void blkcg_policy_teardown_pds(struct request_queue *q,
+ const struct blkcg_policy *pol)
+{
+ struct blkcg_gq *blkg;
+
+ list_for_each_entry(blkg, &q->blkg_list, q_node) {
+ struct blkcg *blkcg = blkg->blkcg;
+ struct blkg_policy_data *pd;
+
+ spin_lock(&blkcg->lock);
+ pd = blkg->pd[pol->plid];
+ if (pd) {
+ if (pd->online && pol->pd_offline_fn)
+ pol->pd_offline_fn(pd);
+ pd->online = false;
+ pol->pd_free_fn(pd);
+ blkg->pd[pol->plid] = NULL;
+ }
+ spin_unlock(&blkcg->lock);
+ }
+}
+
/**
* blkcg_activate_policy - activate a blkcg policy on a gendisk
* @disk: gendisk of interest
* @pol: blkcg policy to activate
*
@@ -1690,25 +1715,11 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
return ret;
enomem:
/* alloc failed, take down everything */
spin_lock_irq(&q->queue_lock);
- list_for_each_entry(blkg, &q->blkg_list, q_node) {
- struct blkcg *blkcg = blkg->blkcg;
- struct blkg_policy_data *pd;
-
- spin_lock(&blkcg->lock);
- pd = blkg->pd[pol->plid];
- if (pd) {
- if (pd->online && pol->pd_offline_fn)
- pol->pd_offline_fn(pd);
- pd->online = false;
- pol->pd_free_fn(pd);
- blkg->pd[pol->plid] = NULL;
- }
- spin_unlock(&blkcg->lock);
- }
+ blkcg_policy_teardown_pds(q, pol);
spin_unlock_irq(&q->queue_lock);
ret = -ENOMEM;
goto out;
}
EXPORT_SYMBOL_GPL(blkcg_activate_policy);
@@ -1723,11 +1734,10 @@ EXPORT_SYMBOL_GPL(blkcg_activate_policy);
*/
void blkcg_deactivate_policy(struct gendisk *disk,
const struct blkcg_policy *pol)
{
struct request_queue *q = disk->queue;
- struct blkcg_gq *blkg;
unsigned int memflags;
if (!blkcg_policy_enabled(q, pol))
return;
@@ -1736,24 +1746,11 @@ void blkcg_deactivate_policy(struct gendisk *disk,
mutex_lock(&q->blkcg_mutex);
spin_lock_irq(&q->queue_lock);
__clear_bit(pol->plid, q->blkcg_pols);
-
- list_for_each_entry(blkg, &q->blkg_list, q_node) {
- struct blkcg *blkcg = blkg->blkcg;
-
- spin_lock(&blkcg->lock);
- if (blkg->pd[pol->plid]) {
- if (blkg->pd[pol->plid]->online && pol->pd_offline_fn)
- pol->pd_offline_fn(blkg->pd[pol->plid]);
- pol->pd_free_fn(blkg->pd[pol->plid]);
- blkg->pd[pol->plid] = NULL;
- }
- spin_unlock(&blkcg->lock);
- }
-
+ blkcg_policy_teardown_pds(q, pol);
spin_unlock_irq(&q->queue_lock);
mutex_unlock(&q->blkcg_mutex);
if (queue_is_mq(q))
blk_mq_unfreeze_queue(q, memflags);
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
` (4 preceding siblings ...)
2026-06-03 13:27 ` [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
@ 2026-06-03 14:35 ` Jens Axboe
[not found] ` <189b8601-87a6-423a-b267-9d550c31c086@fnnas.com>
2026-06-09 15:15 ` Yizhou Tang
6 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2026-06-03 14:35 UTC (permalink / raw)
To: Yu Kuai
Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
linux-block, cgroups, linux-kernel
On 6/3/26 7:27 AM, Yu Kuai wrote:
> This small series fixes races between blkg destruction, q->blkg_list
> iteration, and blkcg policy activation.
In spam:
Authentication-Results: mx.google.com; spf=pass (google.com: domain of srs0=eh6w=d7=fygo.io=yukuai@kernel.org designates 2600:3c04:e001:324:0:1991:8:25 as permitted sender) smtp.mailfrom="SRS0=EH6w=D7=fygo.io=yukuai@kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) header.from=fygo.io
if you don't get this sorted out, your messages will be missed as they
end up in spam for most people.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
` (5 preceding siblings ...)
2026-06-03 14:35 ` [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Jens Axboe
@ 2026-06-09 15:15 ` Yizhou Tang
6 siblings, 0 replies; 11+ messages in thread
From: Yizhou Tang @ 2026-06-09 15:15 UTC (permalink / raw)
To: Yu Kuai
Cc: Jens Axboe, Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff,
Bart Van Assche, linux-block, cgroups, linux-kernel
On Wed, Jun 3, 2026 at 9:35 PM Yu Kuai <yukuai@fygo.io> wrote:
>
> This small series fixes races between blkg destruction, q->blkg_list
> iteration, and blkcg policy activation.
>
> The first two patches serialize q->blkg_list walks in blkg_destroy_all()
> and BFQ writeback weight-raising teardown with blkcg_mutex. The next two
> patches close policy activation races with concurrent blkg destruction,
> including skipping blkgs that are already dying. The final patch factors
> the common policy data teardown loop.
>
> This uses blkcg_mutex rather than extending queue_lock coverage because
> the races are about blkg list visibility and policy-data lifetime, not
> request-queue dispatch state. blkg_free_workfn() already uses
> blkcg_mutex to serialize policy-data freeing with policy deactivation
> and removes blkgs from q->blkg_list only after that teardown. Taking the
> same mutex around the remaining q->blkg_list walkers gives one sleepable
> serialization point for blkg lifetime, avoids adding more queue_lock
> nesting, and prepares the follow-up conversion that removes queue_lock
> from blkcg list protection entirely.
LGTM.
I had noticed some time ago that blkcg_activate_policy() did not take
q->blkcg_mutex, and this patchset nicely resolves my concern.
Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>
Best regards,
Yi
>
> Yu Kuai (2):
> blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with
> blkcg_mutex
> bfq: protect q->blkg_list iteration in bfq_end_wr_async() with
> blkcg_mutex
>
> Zheng Qixing (3):
> blk-cgroup: fix race between policy activation and blkg destruction
> blk-cgroup: skip dying blkg in blkcg_activate_policy()
> blk-cgroup: factor policy pd teardown loop into helper
>
> block/bfq-cgroup.c | 3 ++-
> block/bfq-iosched.c | 6 +++++
> block/blk-cgroup.c | 65 ++++++++++++++++++++++++---------------------
> 3 files changed, 43 insertions(+), 31 deletions(-)
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread