Linux cgroups development
 help / color / mirror / Atom feed
* [PATCH 0/2] fix two issues in blkg_create() error path
@ 2026-06-22  7:07 Zizhi Wo
  2026-06-22  7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo
  2026-06-22  7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo
  0 siblings, 2 replies; 5+ messages in thread
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi

From: Zizhi Wo <wozizhi@huawei.com>

This series fixes two issues on the blkg_create() error path.

Patch 1 fixes a blkg leak when blkg_create() fails.

Patch 2 fixes a null-ptr-deref by freeing blkg->pd synchronously on the
error path, otherwise the async free path may dereference an already
unregistered blkcg_policy.

Zizhi Wo (2):
  blk-cgroup: fix blkg leak in blkg_create() error path
  blk-cgroup: fix Null-ptr-deref by freeing blkg pd on blkg_create error
    path

 block/blk-cgroup.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
  2026-06-22  7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo
@ 2026-06-22  7:07 ` Zizhi Wo
  2026-06-23  1:16   ` Tao Cui
  2026-06-22  7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo
  1 sibling, 1 reply; 5+ messages in thread
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi

When radix_tree_insert() fails in blkg_create(), the error path calls
blkg_put() to release the blkg. This was correct when blkg->refcnt was an
atomic_t: blkg_put() dropped it to 0 and triggered the release path.

But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
percpu_ref") switched refcnt to a percpu_ref. In percpu mode
percpu_ref_put() never checks for zero, so the release callback is never
invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
blkg_destroy()->percpu_ref_kill() either, cause the leak.

Fix it by killing the percpu_ref instead, which switches it to atomic mode
and drops the initial ref.

Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
 block/blk-cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index bc63bd220865..6386fe413994 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
 
 	if (!ret)
 		return blkg;
 
 	/* @blkg failed fully initialized, use the usual release path */
-	blkg_put(blkg);
+	percpu_ref_kill(&blkg->refcnt);
 	return ERR_PTR(ret);
 
 err_put_css:
 	css_put(&blkcg->css);
 err_free_blkg:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create error path
  2026-06-22  7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo
  2026-06-22  7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo
@ 2026-06-22  7:07 ` Zizhi Wo
  1 sibling, 0 replies; 5+ messages in thread
From: Zizhi Wo @ 2026-06-22  7:07 UTC (permalink / raw)
  To: axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi

From: Zizhi Wo <wozizhi@huawei.com>

When blkg_create() fails before the blkg is linked onto blkcg->blkg_list
and q->blkg_list (e.g. radix_tree_insert() fails or the blkg_lookup()
returns NULL), the blkg is freed asynchronously via blkg_free_workfn().

Since such a blkg was never linked, it is invisible to
blkcg_deactivate_policy(), so its blkg->pd[] entries can not be cleared in
it. blkg_free_workfn() then calls blkcg_policy->pd_free_fn() on them, which
can race with bfq module exit (bfq_exit() -> blkcg_policy_unregister())
clearing the blkcg_policy[] slot, leading to a NULL pointer dereference:

[   72.597786] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f]
[   72.598690] CPU: 35 UID: 0 PID: 458 Comm: kworker/35:1 Not tainted 7.1.0+ #33 PREEMPT(full)
[   72.599518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
[   72.600342] Workqueue: events blkg_free_workfn
[   72.600991] RIP: 0010:blkg_free_workfn+0x115/0x3d0
......
[   72.613278] Call Trace:
[   72.613988]  <TASK>
[   72.614357]  process_one_work+0x6b4/0xff0
[   72.615251]  ? __pfx_blkg_free_workfn+0x10/0x10
[   72.616041]  ? assign_work+0x131/0x3f0
[   72.616962]  worker_thread+0x4eb/0xd50
[   72.617599]  ? __kthread_parkme+0x8d/0x170
[   72.618565]  ? __pfx_worker_thread+0x10/0x10
[   72.619566]  ? __pfx_worker_thread+0x10/0x10
[   72.620213]  kthread+0x327/0x410
......

Fix this by introducing blkg_free_pd() to synchronously free the pd and
clear blkg->pd[] in the blkg_create() error path, while the blkcg_policy
is still valid.

Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
 block/blk-cgroup.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6386fe413994..673886d71c26 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -109,28 +109,37 @@ static struct cgroup_subsys_state *blkcg_css(void)
 	if (css)
 		return css;
 	return task_css(current, io_cgrp_id);
 }
 
+static void blkg_free_pd(struct blkcg_gq *blkg)
+{
+	int i;
+
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
+		if (blkg->pd[i]) {
+			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
+			blkg->pd[i] = NULL;
+		}
+	}
+}
+
 static void blkg_free_workfn(struct work_struct *work)
 {
 	struct blkcg_gq *blkg = container_of(work, struct blkcg_gq,
 					     free_work);
 	struct request_queue *q = blkg->q;
-	int i;
 
 	/*
 	 * pd_free_fn() can also be called from blkcg_deactivate_policy(),
 	 * in order to make sure pd_free_fn() is called in order, the deletion
 	 * of the list blkg->q_node is delayed to here from blkg_destroy(), and
 	 * blkcg_mutex is used to synchronize blkg_free_workfn() and
 	 * blkcg_deactivate_policy().
 	 */
 	mutex_lock(&q->blkcg_mutex);
-	for (i = 0; i < BLKCG_MAX_POLS; i++)
-		if (blkg->pd[i])
-			blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
+	blkg_free_pd(blkg);
 	if (blkg->parent)
 		blkg_put(blkg->parent);
 	spin_lock_irq(&q->queue_lock);
 	list_del_init(&blkg->q_node);
 	spin_unlock_irq(&q->queue_lock);
@@ -436,19 +445,28 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
 	spin_unlock(&blkcg->lock);
 
 	if (!ret)
 		return blkg;
 
-	/* @blkg failed fully initialized, use the usual release path */
+	/*
+	 * @blkg failed fully initialized and never linked, so its pd[] is
+	 * invisible to blkcg_deactivate_policy(). Free pd[] synchronously
+	 * while blkcg_policy[] is still valid, otherwise the async free path
+	 * may call pd_free_fn() after the policy is unregistered (e.g. rmmod bfq).
+	 * The err_free_blkg path below frees pd[] for the same reason.
+	 */
+	blkg_free_pd(blkg);
 	percpu_ref_kill(&blkg->refcnt);
 	return ERR_PTR(ret);
 
 err_put_css:
 	css_put(&blkcg->css);
 err_free_blkg:
-	if (new_blkg)
+	if (new_blkg) {
+		blkg_free_pd(new_blkg);
 		blkg_free(new_blkg);
+	}
 	return ERR_PTR(ret);
 }
 
 /**
  * blkg_lookup_create - lookup blkg, try to create one if not there
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
  2026-06-22  7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo
@ 2026-06-23  1:16   ` Tao Cui
  2026-06-23  1:38     ` Zizhi Wo
  0 siblings, 1 reply; 5+ messages in thread
From: Tao Cui @ 2026-06-23  1:16 UTC (permalink / raw)
  To: Zizhi Wo, axboe, tj, josef, linux-block
  Cc: cui.tao, cgroups, yangerkun, chengzhihao1, houtao1, yukuai

Hi Zizhi,

Thanks for the patch.  I ran into the same issue and posted a fix for it
earlier:

  https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/

The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()),
plus one extra change: moving blkg->online = true into the success
block:

	if (likely(!ret)) {
		...
+		blkg->online = true;
	}
-	blkg->online = true;

On the failure path the blkg was never inserted into any list, and its
blkg->pd[i]->online flags were not set either (those are in the same
block).  Leaving blkg->online = true marks a blkg as online that was
never created -- inconsistent with pd[]->online and with
blkg_destroy(), which clears blkg->online = false.  Not observable
today, since the failed blkg is on no list and unreachable by the
online readers, but the flag should track the actual insertion.

(This was sent to the cgroups list rather than linux-block, hence the
overlap.)

Thanks,
Tao

在 2026/6/22 15:07, Zizhi Wo 写道:
> When radix_tree_insert() fails in blkg_create(), the error path calls
> blkg_put() to release the blkg. This was correct when blkg->refcnt was an
> atomic_t: blkg_put() dropped it to 0 and triggered the release path.
> 
> But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
> percpu_ref") switched refcnt to a percpu_ref. In percpu mode
> percpu_ref_put() never checks for zero, so the release callback is never
> invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
> blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
> blkg_destroy()->percpu_ref_kill() either, cause the leak.
> 
> Fix it by killing the percpu_ref instead, which switches it to atomic mode
> and drops the initial ref.
> 
> Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
> Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
> ---
>  block/blk-cgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index bc63bd220865..6386fe413994 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
>  
>  	if (!ret)
>  		return blkg;
>  
>  	/* @blkg failed fully initialized, use the usual release path */
> -	blkg_put(blkg);
> +	percpu_ref_kill(&blkg->refcnt);
>  	return ERR_PTR(ret);
>  
>  err_put_css:
>  	css_put(&blkcg->css);
>  err_free_blkg:


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path
  2026-06-23  1:16   ` Tao Cui
@ 2026-06-23  1:38     ` Zizhi Wo
  0 siblings, 0 replies; 5+ messages in thread
From: Zizhi Wo @ 2026-06-23  1:38 UTC (permalink / raw)
  To: Tao Cui, axboe, tj, josef, linux-block
  Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai



在 2026/6/23 9:16, Tao Cui 写道:
> Hi Zizhi,
> 
> Thanks for the patch.  I ran into the same issue and posted a fix for it
> earlier:
> 
>    https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/
> 
> The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()),
> plus one extra change: moving blkg->online = true into the success
> block:
> 
> 	if (likely(!ret)) {
> 		...
> +		blkg->online = true;
> 	}
> -	blkg->online = true;
> 
> On the failure path the blkg was never inserted into any list, and its
> blkg->pd[i]->online flags were not set either (those are in the same
> block).  Leaving blkg->online = true marks a blkg as online that was
> never created -- inconsistent with pd[]->online and with
> blkg_destroy(), which clears blkg->online = false.  Not observable
> today, since the failed blkg is on no list and unreachable by the
> online readers, but the flag should track the actual insertion.
> 
> (This was sent to the cgroups list rather than linux-block, hence the
> overlap.)
> 
> Thanks,
> Tao

I'm not subscribed to the cgroup mailing list, so I didn't see that this
issue had already been fixed. :( And indeed, your patch nicely updates
blkg->online as well. — I hadn't realized that.

Thanks for the heads-up!

Thanks,
Zizhi Wo

> 
> 在 2026/6/22 15:07, Zizhi Wo 写道:
>> When radix_tree_insert() fails in blkg_create(), the error path calls
>> blkg_put() to release the blkg. This was correct when blkg->refcnt was an
>> atomic_t: blkg_put() dropped it to 0 and triggered the release path.
>>
>> But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use
>> percpu_ref") switched refcnt to a percpu_ref. In percpu mode
>> percpu_ref_put() never checks for zero, so the release callback is never
>> invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so
>> blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call
>> blkg_destroy()->percpu_ref_kill() either, cause the leak.
>>
>> Fix it by killing the percpu_ref instead, which switches it to atomic mode
>> and drops the initial ref.
>>
>> Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref")
>> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
>> Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
>> ---
>>   block/blk-cgroup.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>> index bc63bd220865..6386fe413994 100644
>> --- a/block/blk-cgroup.c
>> +++ b/block/blk-cgroup.c
>> @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk,
>>   
>>   	if (!ret)
>>   		return blkg;
>>   
>>   	/* @blkg failed fully initialized, use the usual release path */
>> -	blkg_put(blkg);
>> +	percpu_ref_kill(&blkg->refcnt);
>>   	return ERR_PTR(ret);
>>   
>>   err_put_css:
>>   	css_put(&blkcg->css);
>>   err_free_blkg:


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-23  1:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22  7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo
2026-06-22  7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo
2026-06-23  1:16   ` Tao Cui
2026-06-23  1:38     ` Zizhi Wo
2026-06-22  7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox