[PATCH 0/5] blk-cgroup: fix blkg list and policy data races

Linux block layer
 help / color / mirror / Atom feed

* [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
@ 2026-06-03 13:27 Yu Kuai
  2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

This small series fixes races between blkg destruction, q->blkg_list
iteration, and blkcg policy activation.

The first two patches serialize q->blkg_list walks in blkg_destroy_all()
and BFQ writeback weight-raising teardown with blkcg_mutex. The next two
patches close policy activation races with concurrent blkg destruction,
including skipping blkgs that are already dying. The final patch factors
the common policy data teardown loop.

This uses blkcg_mutex rather than extending queue_lock coverage because
the races are about blkg list visibility and policy-data lifetime, not
request-queue dispatch state. blkg_free_workfn() already uses
blkcg_mutex to serialize policy-data freeing with policy deactivation
and removes blkgs from q->blkg_list only after that teardown. Taking the
same mutex around the remaining q->blkg_list walkers gives one sleepable
serialization point for blkg lifetime, avoids adding more queue_lock
nesting, and prepares the follow-up conversion that removes queue_lock
from blkcg list protection entirely.

Yu Kuai (2):
  blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with
    blkcg_mutex
  bfq: protect q->blkg_list iteration in bfq_end_wr_async() with
    blkcg_mutex

Zheng Qixing (3):
  blk-cgroup: fix race between policy activation and blkg destruction
  blk-cgroup: skip dying blkg in blkcg_activate_policy()
  blk-cgroup: factor policy pd teardown loop into helper

 block/bfq-cgroup.c  |  3 ++-
 block/bfq-iosched.c |  6 +++++
 block/blk-cgroup.c  | 65 ++++++++++++++++++++++++---------------------
 3 files changed, 43 insertions(+), 31 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
  2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

blkg_destroy_all() iterates q->blkg_list without holding blkcg_mutex,
which can race with blkg_free_workfn() that removes blkgs from the list
while holding blkcg_mutex.

Add blkcg_mutex protection around the q->blkg_list iteration to prevent
potential list corruption or use-after-free issues.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 block/blk-cgroup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 554c87bb4a86..a98a22e06fd1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -573,10 +573,11 @@ static void blkg_destroy_all(struct gendisk *disk)
 	struct blkcg_gq *blkg;
 	int count = BLKG_DESTROY_BATCH_SIZE;
 	int i;
 
 restart:
+	mutex_lock(&q->blkcg_mutex);
 	spin_lock_irq(&q->queue_lock);
 	list_for_each_entry(blkg, &q->blkg_list, q_node) {
 		struct blkcg *blkcg = blkg->blkcg;
 
 		if (hlist_unhashed(&blkg->blkcg_node))
@@ -591,10 +592,11 @@ static void blkg_destroy_all(struct gendisk *disk)
 		 * it when a batch of blkgs are destroyed.
 		 */
 		if (!(--count)) {
 			count = BLKG_DESTROY_BATCH_SIZE;
 			spin_unlock_irq(&q->queue_lock);
+			mutex_unlock(&q->blkcg_mutex);
 			cond_resched();
 			goto restart;
 		}
 	}
 
@@ -610,10 +612,11 @@ static void blkg_destroy_all(struct gendisk *disk)
 			__clear_bit(pol->plid, q->blkcg_pols);
 	}
 
 	q->root_blkg = NULL;
 	spin_unlock_irq(&q->queue_lock);
+	mutex_unlock(&q->blkcg_mutex);
 
 	wake_up_var(&q->root_blkg);
 }
 
 static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
  2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
  2026-06-04 17:31   ` Nilay Shroff
  2026-06-03 13:27 ` [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
but not blkcg_mutex. This can race with blkg_free_workfn() that removes
blkgs from the list while holding blkcg_mutex.

Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
ensure proper synchronization when iterating q->blkg_list.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 block/bfq-cgroup.c  | 3 ++-
 block/bfq-iosched.c | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 37ab70930c8d..f765e767d36a 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
 	struct blkcg_gq *blkg;
 
 	list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
 		struct bfq_group *bfqg = blkg_to_bfqg(blkg);
 
-		bfq_end_wr_async_queues(bfqd, bfqg);
+		if (bfqg)
+			bfq_end_wr_async_queues(bfqd, bfqg);
 	}
 	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
 }
 
 static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 141c602d5e85..42ccfd0c6140 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
 static void bfq_end_wr(struct bfq_data *bfqd)
 {
 	struct bfq_queue *bfqq;
 	int i;
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	mutex_lock(&bfqd->queue->blkcg_mutex);
+#endif
 	spin_lock_irq(&bfqd->lock);
 
 	for (i = 0; i < bfqd->num_actuators; i++) {
 		list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
 			bfq_bfqq_end_wr(bfqq);
@@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
 	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
 		bfq_bfqq_end_wr(bfqq);
 	bfq_end_wr_async(bfqd);
 
 	spin_unlock_irq(&bfqd->lock);
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	mutex_unlock(&bfqd->queue->blkcg_mutex);
+#endif
 }
 
 static sector_t bfq_io_struct_pos(void *io_struct, bool request)
 {
 	if (request)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
  2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
@ 2026-06-04 17:31   ` Nilay Shroff
  2026-06-08  2:48     ` yu kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Nilay Shroff @ 2026-06-04 17:31 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Bart Van Assche, linux-block,
	cgroups, linux-kernel

On 6/3/26 6:57 PM, Yu Kuai wrote:
> bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
> but not blkcg_mutex. This can race with blkg_free_workfn() that removes
> blkgs from the list while holding blkcg_mutex.
> 
> Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
> ensure proper synchronization when iterating q->blkg_list.
> 
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
> ---
>   block/bfq-cgroup.c  | 3 ++-
>   block/bfq-iosched.c | 6 ++++++
>   2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 37ab70930c8d..f765e767d36a 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
>   	struct blkcg_gq *blkg;
>   
>   	list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
>   		struct bfq_group *bfqg = blkg_to_bfqg(blkg);
>   
> -		bfq_end_wr_async_queues(bfqd, bfqg);
> +		if (bfqg)
> +			bfq_end_wr_async_queues(bfqd, bfqg);
>   	}
>   	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
>   }
>   
>   static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 141c602d5e85..42ccfd0c6140 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
>   static void bfq_end_wr(struct bfq_data *bfqd)
>   {
>   	struct bfq_queue *bfqq;
>   	int i;
>   
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> +	mutex_lock(&bfqd->queue->blkcg_mutex);
> +#endif
>   	spin_lock_irq(&bfqd->lock);
>   
>   	for (i = 0; i < bfqd->num_actuators; i++) {
>   		list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
>   			bfq_bfqq_end_wr(bfqq);
> @@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
>   	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
>   		bfq_bfqq_end_wr(bfqq);
>   	bfq_end_wr_async(bfqd);
>   
>   	spin_unlock_irq(&bfqd->lock);
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> +	mutex_unlock(&bfqd->queue->blkcg_mutex);
> +#endif
>   }

The above change protects the q->blkg_list iteration in bfq_end_wr_async()
against list removal in blkg_free_workfn(). However the blkg insertion in
blkg_create() still doesn't use q->blkcg_mutex and so list traversal in
bfq_end_wr_async() may still race with blkg_create().

So I think we may also need to protect blkg insert in blkg_create() using
q->blkcg_mutex.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() with blkcg_mutex
  2026-06-04 17:31   ` Nilay Shroff
@ 2026-06-08  2:48     ` yu kuai
  0 siblings, 0 replies; 11+ messages in thread
From: yu kuai @ 2026-06-08  2:48 UTC (permalink / raw)
  To: Nilay Shroff, Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Bart Van Assche, linux-block,
	cgroups, linux-kernel, yukuai

Hi,

在 2026/6/5 1:31, Nilay Shroff 写道:
> On 6/3/26 6:57 PM, Yu Kuai wrote:
>> bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock,
>> but not blkcg_mutex. This can race with blkg_free_workfn() that removes
>> blkgs from the list while holding blkcg_mutex.
>>
>> Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to
>> ensure proper synchronization when iterating q->blkg_list.
>>
>> Signed-off-by: Yu Kuai <yukuai@fygo.io>
>> ---
>>   block/bfq-cgroup.c  | 3 ++-
>>   block/bfq-iosched.c | 6 ++++++
>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 37ab70930c8d..f765e767d36a 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -939,11 +939,12 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
>>       struct blkcg_gq *blkg;
>>         list_for_each_entry(blkg, &bfqd->queue->blkg_list, q_node) {
>>           struct bfq_group *bfqg = blkg_to_bfqg(blkg);
>>   -        bfq_end_wr_async_queues(bfqd, bfqg);
>> +        if (bfqg)
>> +            bfq_end_wr_async_queues(bfqd, bfqg);
>>       }
>>       bfq_end_wr_async_queues(bfqd, bfqd->root_group);
>>   }
>>     static int bfq_io_show_weight_legacy(struct seq_file *sf, void *v)
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 141c602d5e85..42ccfd0c6140 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -2643,10 +2643,13 @@ void bfq_end_wr_async_queues(struct bfq_data 
>> *bfqd,
>>   static void bfq_end_wr(struct bfq_data *bfqd)
>>   {
>>       struct bfq_queue *bfqq;
>>       int i;
>>   +#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> +    mutex_lock(&bfqd->queue->blkcg_mutex);
>> +#endif
>>       spin_lock_irq(&bfqd->lock);
>>         for (i = 0; i < bfqd->num_actuators; i++) {
>>           list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
>>               bfq_bfqq_end_wr(bfqq);
>> @@ -2654,10 +2657,13 @@ static void bfq_end_wr(struct bfq_data *bfqd)
>>       list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
>>           bfq_bfqq_end_wr(bfqq);
>>       bfq_end_wr_async(bfqd);
>>         spin_unlock_irq(&bfqd->lock);
>> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
>> +    mutex_unlock(&bfqd->queue->blkcg_mutex);
>> +#endif
>>   }
>
> The above change protects the q->blkg_list iteration in 
> bfq_end_wr_async()
> against list removal in blkg_free_workfn(). However the blkg insertion in
> blkg_create() still doesn't use q->blkcg_mutex and so list traversal in
> bfq_end_wr_async() may still race with blkg_create().
>
> So I think we may also need to protect blkg insert in blkg_create() using
> q->blkcg_mutex.

Yes, this is done in another huge patchset, because currently blkg_create()
is protected by queue_lock and can be called under rcu, code refactor will
be required.

I'll send the first set soon.

>
> Thanks,
> --Nilay

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
  2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
  2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
  2026-06-03 13:27 ` [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

From: Zheng Qixing <zhengqixing@huawei.com>

When switching an IO scheduler on a block device, blkcg_activate_policy()
allocates blkg_policy_data (pd) for all blkgs attached to the queue.
However, blkcg_activate_policy() may race with concurrent blkcg deletion,
leading to use-after-free and memory leak issues.

The use-after-free occurs in the following race:

T1 (blkcg_activate_policy):
  - Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
  - Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
  - Enters the enomem rollback path to release blkg1 resources

T2 (blkcg deletion):
  - blkcgA is deleted concurrently
  - blkg1 is freed via blkg_free_workfn()
  - blkg1->pd is freed

T1 (continued):
  - Rollback path accesses blkg1->pd->online after pd is freed
  - Triggers use-after-free

In addition, blkg_free_workfn() frees pd before removing the blkg from
q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd
for a blkg that is being destroyed, leaving the newly allocated pd
unreachable when the blkg is finally freed.

Fix these races by extending blkcg_mutex coverage to serialize
blkcg_activate_policy() rollback and blkg destruction, ensuring pd
lifecycle is synchronized with blkg list visibility.

Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 block/blk-cgroup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a98a22e06fd1..007dfc4f9c59 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1612,10 +1612,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
 	if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
 		return -EINVAL;
 
 	if (queue_is_mq(q))
 		memflags = blk_mq_freeze_queue(q);
+
+	mutex_lock(&q->blkcg_mutex);
 retry:
 	spin_lock_irq(&q->queue_lock);
 
 	/* blkg_list is pushed at the head, reverse walk to initialize parents first */
 	list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
@@ -1674,10 +1676,11 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
 	__set_bit(pol->plid, q->blkcg_pols);
 	ret = 0;
 
 	spin_unlock_irq(&q->queue_lock);
 out:
+	mutex_unlock(&q->blkcg_mutex);
 	if (queue_is_mq(q))
 		blk_mq_unfreeze_queue(q, memflags);
 	if (pinned_blkg)
 		blkg_put(pinned_blkg);
 	if (pd_prealloc)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy()
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
                   ` (2 preceding siblings ...)
  2026-06-03 13:27 ` [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
  2026-06-03 13:27 ` [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

From: Zheng Qixing <zhengqixing@huawei.com>

When switching IO schedulers on a block device, blkcg_activate_policy()
can race with concurrent blkcg deletion, leading to a use-after-free in
rcu_accelerate_cbs.

T1:                               T2:
                                  blkg_destroy
                                  kill(&blkg->refcnt) // blkg->refcnt=1->0
                                  blkg_release // call_rcu(__blkg_release)
                                  ...
                                  blkg_free_workfn
                                  ->pd_free_fn(pd)
elv_iosched_store
elevator_switch
...
iterate blkg list
blkg_get(blkg) // blkg->refcnt=0->1
                                  list_del_init(&blkg->q_node)
blkg_put(pinned_blkg) // blkg->refcnt=1->0
blkg_release // call_rcu again
rcu_accelerate_cbs // uaf

Fix this by checking hlist_unhashed(&blkg->blkcg_node) before getting
a reference to the blkg. This is the same check used in blkg_destroy()
to detect if a blkg has already been destroyed. If the blkg is already
unhashed, skip processing it since it's being destroyed.

Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 block/blk-cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 007dfc4f9c59..b479a9796fb3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1623,10 +1623,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
 	list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
 		struct blkg_policy_data *pd;
 
 		if (blkg->pd[pol->plid])
 			continue;
+		if (hlist_unhashed(&blkg->blkcg_node))
+			continue;
 
 		/* If prealloc matches, use it; otherwise try GFP_NOWAIT */
 		if (blkg == pinned_blkg) {
 			pd = pd_prealloc;
 			pd_prealloc = NULL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
                   ` (3 preceding siblings ...)
  2026-06-03 13:27 ` [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
@ 2026-06-03 13:27 ` Yu Kuai
  2026-06-03 14:35 ` [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Jens Axboe
  2026-06-09 15:15 ` Yizhou Tang
  6 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-06-03 13:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

From: Zheng Qixing <zhengqixing@huawei.com>

Move the teardown sequence which offlines and frees per-policy
blkg_policy_data (pd) into a helper for readability.

No functional change intended.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yu Kuai <yukuai@fygo.io>
Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 block/blk-cgroup.c | 57 ++++++++++++++++++++++------------------------
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b479a9796fb3..c75b2a103bbc 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1575,10 +1575,35 @@ struct cgroup_subsys io_cgrp_subsys = {
 	.depends_on = 1 << memory_cgrp_id,
 #endif
 };
 EXPORT_SYMBOL_GPL(io_cgrp_subsys);
 
+/*
+ * Tear down per-blkg policy data for @pol on @q.
+ */
+static void blkcg_policy_teardown_pds(struct request_queue *q,
+				      const struct blkcg_policy *pol)
+{
+	struct blkcg_gq *blkg;
+
+	list_for_each_entry(blkg, &q->blkg_list, q_node) {
+		struct blkcg *blkcg = blkg->blkcg;
+		struct blkg_policy_data *pd;
+
+		spin_lock(&blkcg->lock);
+		pd = blkg->pd[pol->plid];
+		if (pd) {
+			if (pd->online && pol->pd_offline_fn)
+				pol->pd_offline_fn(pd);
+			pd->online = false;
+			pol->pd_free_fn(pd);
+			blkg->pd[pol->plid] = NULL;
+		}
+		spin_unlock(&blkcg->lock);
+	}
+}
+
 /**
  * blkcg_activate_policy - activate a blkcg policy on a gendisk
  * @disk: gendisk of interest
  * @pol: blkcg policy to activate
  *
@@ -1690,25 +1715,11 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
 	return ret;
 
 enomem:
 	/* alloc failed, take down everything */
 	spin_lock_irq(&q->queue_lock);
-	list_for_each_entry(blkg, &q->blkg_list, q_node) {
-		struct blkcg *blkcg = blkg->blkcg;
-		struct blkg_policy_data *pd;
-
-		spin_lock(&blkcg->lock);
-		pd = blkg->pd[pol->plid];
-		if (pd) {
-			if (pd->online && pol->pd_offline_fn)
-				pol->pd_offline_fn(pd);
-			pd->online = false;
-			pol->pd_free_fn(pd);
-			blkg->pd[pol->plid] = NULL;
-		}
-		spin_unlock(&blkcg->lock);
-	}
+	blkcg_policy_teardown_pds(q, pol);
 	spin_unlock_irq(&q->queue_lock);
 	ret = -ENOMEM;
 	goto out;
 }
 EXPORT_SYMBOL_GPL(blkcg_activate_policy);
@@ -1723,11 +1734,10 @@ EXPORT_SYMBOL_GPL(blkcg_activate_policy);
  */
 void blkcg_deactivate_policy(struct gendisk *disk,
 			     const struct blkcg_policy *pol)
 {
 	struct request_queue *q = disk->queue;
-	struct blkcg_gq *blkg;
 	unsigned int memflags;
 
 	if (!blkcg_policy_enabled(q, pol))
 		return;
 
@@ -1736,24 +1746,11 @@ void blkcg_deactivate_policy(struct gendisk *disk,
 
 	mutex_lock(&q->blkcg_mutex);
 	spin_lock_irq(&q->queue_lock);
 
 	__clear_bit(pol->plid, q->blkcg_pols);
-
-	list_for_each_entry(blkg, &q->blkg_list, q_node) {
-		struct blkcg *blkcg = blkg->blkcg;
-
-		spin_lock(&blkcg->lock);
-		if (blkg->pd[pol->plid]) {
-			if (blkg->pd[pol->plid]->online && pol->pd_offline_fn)
-				pol->pd_offline_fn(blkg->pd[pol->plid]);
-			pol->pd_free_fn(blkg->pd[pol->plid]);
-			blkg->pd[pol->plid] = NULL;
-		}
-		spin_unlock(&blkcg->lock);
-	}
-
+	blkcg_policy_teardown_pds(q, pol);
 	spin_unlock_irq(&q->queue_lock);
 	mutex_unlock(&q->blkcg_mutex);
 
 	if (queue_is_mq(q))
 		blk_mq_unfreeze_queue(q, memflags);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
                   ` (4 preceding siblings ...)
  2026-06-03 13:27 ` [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
@ 2026-06-03 14:35 ` Jens Axboe
       [not found]   ` <189b8601-87a6-423a-b267-9d550c31c086@fnnas.com>
  2026-06-09 15:15 ` Yizhou Tang
  6 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2026-06-03 14:35 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

On 6/3/26 7:27 AM, Yu Kuai wrote:
> This small series fixes races between blkg destruction, q->blkg_list
> iteration, and blkcg policy activation.

In spam:

Authentication-Results: mx.google.com; spf=pass (google.com: domain of srs0=eh6w=d7=fygo.io=yukuai@kernel.org designates 2600:3c04:e001:324:0:1991:8:25 as permitted sender) smtp.mailfrom="SRS0=EH6w=D7=fygo.io=yukuai@kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) header.from=fygo.io

if you don't get this sorted out, your messages will be missed as they
end up in spam for most people.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <189b8601-87a6-423a-b267-9d550c31c086@fnnas.com>]

* Re: [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
       [not found]   ` <189b8601-87a6-423a-b267-9d550c31c086@fnnas.com>
@ 2026-06-03 15:08     ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2026-06-03 15:08 UTC (permalink / raw)
  To: yukuai, Yu Kuai
  Cc: Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff, Bart Van Assche,
	linux-block, cgroups, linux-kernel

On 6/3/26 9:06 AM, yukuai wrote:
> Hi,
> 
> ? 2026/6/3 22:35, Jens Axboe ??:
>> On 6/3/26 7:27 AM, Yu Kuai wrote:
>>> This small series fixes races between blkg destruction, q->blkg_list
>>> iteration, and blkcg policy activation.
>> In spam: Authentication-Results: mx.google.com <mx.google.com>; spf=pass (google.com: <google.com:> domain of srs0=eh6w=d7=fygo.io=yukuai@kernel.org designates 2600:3c04:e001:324:0:1991:8:25 as permitted sender) smtp.mailfrom="SRS0=EH6w=D7=fygo.io=yukuai@kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) header.from=fygo.io <header.from=fygo.io> if you don't get this sorted out, your messages will be missed as they end up in spam for most people.
> 
> Thanks again for the notice. I think I finally figured out why dmarc failed. 

This email looks better!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] blk-cgroup: fix blkg list and policy data races
  2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
                   ` (5 preceding siblings ...)
  2026-06-03 14:35 ` [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Jens Axboe
@ 2026-06-09 15:15 ` Yizhou Tang
  6 siblings, 0 replies; 11+ messages in thread
From: Yizhou Tang @ 2026-06-09 15:15 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jens Axboe, Tejun Heo, Josef Bacik, Ming Lei, Nilay Shroff,
	Bart Van Assche, linux-block, cgroups, linux-kernel

On Wed, Jun 3, 2026 at 9:35 PM Yu Kuai <yukuai@fygo.io> wrote:
>
> This small series fixes races between blkg destruction, q->blkg_list
> iteration, and blkcg policy activation.
>
> The first two patches serialize q->blkg_list walks in blkg_destroy_all()
> and BFQ writeback weight-raising teardown with blkcg_mutex. The next two
> patches close policy activation races with concurrent blkg destruction,
> including skipping blkgs that are already dying. The final patch factors
> the common policy data teardown loop.
>
> This uses blkcg_mutex rather than extending queue_lock coverage because
> the races are about blkg list visibility and policy-data lifetime, not
> request-queue dispatch state. blkg_free_workfn() already uses
> blkcg_mutex to serialize policy-data freeing with policy deactivation
> and removes blkgs from q->blkg_list only after that teardown. Taking the
> same mutex around the remaining q->blkg_list walkers gives one sleepable
> serialization point for blkg lifetime, avoids adding more queue_lock
> nesting, and prepares the follow-up conversion that removes queue_lock
> from blkcg list protection entirely.

LGTM.

I had noticed some time ago that blkcg_activate_policy() did not take
q->blkcg_mutex, and this patchset nicely resolves my concern.

Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com>

Best regards,
Yi


>
> Yu Kuai (2):
>   blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with
>     blkcg_mutex
>   bfq: protect q->blkg_list iteration in bfq_end_wr_async() with
>     blkcg_mutex
>
> Zheng Qixing (3):
>   blk-cgroup: fix race between policy activation and blkg destruction
>   blk-cgroup: skip dying blkg in blkcg_activate_policy()
>   blk-cgroup: factor policy pd teardown loop into helper
>
>  block/bfq-cgroup.c  |  3 ++-
>  block/bfq-iosched.c |  6 +++++
>  block/blk-cgroup.c  | 65 ++++++++++++++++++++++++---------------------
>  3 files changed, 43 insertions(+), 31 deletions(-)
>
> --
> 2.51.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-09 15:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 13:27 [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-03 13:27 ` [PATCH 1/5] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-06-03 13:27 ` [PATCH 2/5] bfq: protect q->blkg_list iteration in bfq_end_wr_async() " Yu Kuai
2026-06-04 17:31   ` Nilay Shroff
2026-06-08  2:48     ` yu kuai
2026-06-03 13:27 ` [PATCH 3/5] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
2026-06-03 13:27 ` [PATCH 4/5] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
2026-06-03 13:27 ` [PATCH 5/5] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
2026-06-03 14:35 ` [PATCH 0/5] blk-cgroup: fix blkg list and policy data races Jens Axboe
     [not found]   ` <189b8601-87a6-423a-b267-9d550c31c086@fnnas.com>
2026-06-03 15:08     ` Jens Axboe
2026-06-09 15:15 ` Yizhou Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox