* [PATCH 0/2] fix two issues in blkg_create() error path
@ 2026-06-22 7:07 Zizhi Wo
2026-06-22 7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo
2026-06-22 7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo
0 siblings, 2 replies; 5+ messages in thread
From: Zizhi Wo @ 2026-06-22 7:07 UTC (permalink / raw)
To: axboe, tj, josef, linux-block
Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi
From: Zizhi Wo <wozizhi@huawei.com>
This series fixes two issues on the blkg_create() error path.
Patch 1 fixes a blkg leak when blkg_create() fails.
Patch 2 fixes a null-ptr-deref by freeing blkg->pd synchronously on the
error path, otherwise the async free path may dereference an already
unregistered blkcg_policy.
Zizhi Wo (2):
blk-cgroup: fix blkg leak in blkg_create() error path
blk-cgroup: fix Null-ptr-deref by freeing blkg pd on blkg_create error
path
block/blk-cgroup.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path 2026-06-22 7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo @ 2026-06-22 7:07 ` Zizhi Wo 2026-06-23 1:16 ` Tao Cui 2026-06-22 7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo 1 sibling, 1 reply; 5+ messages in thread From: Zizhi Wo @ 2026-06-22 7:07 UTC (permalink / raw) To: axboe, tj, josef, linux-block Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi When radix_tree_insert() fails in blkg_create(), the error path calls blkg_put() to release the blkg. This was correct when blkg->refcnt was an atomic_t: blkg_put() dropped it to 0 and triggered the release path. But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref") switched refcnt to a percpu_ref. In percpu mode percpu_ref_put() never checks for zero, so the release callback is never invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call blkg_destroy()->percpu_ref_kill() either, cause the leak. Fix it by killing the percpu_ref instead, which switches it to atomic mode and drops the initial ref. Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref") Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- block/blk-cgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bc63bd220865..6386fe413994 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk, if (!ret) return blkg; /* @blkg failed fully initialized, use the usual release path */ - blkg_put(blkg); + percpu_ref_kill(&blkg->refcnt); return ERR_PTR(ret); err_put_css: css_put(&blkcg->css); err_free_blkg: -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path 2026-06-22 7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo @ 2026-06-23 1:16 ` Tao Cui 2026-06-23 1:38 ` Zizhi Wo 0 siblings, 1 reply; 5+ messages in thread From: Tao Cui @ 2026-06-23 1:16 UTC (permalink / raw) To: Zizhi Wo, axboe, tj, josef, linux-block Cc: cui.tao, cgroups, yangerkun, chengzhihao1, houtao1, yukuai Hi Zizhi, Thanks for the patch. I ran into the same issue and posted a fix for it earlier: https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/ The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()), plus one extra change: moving blkg->online = true into the success block: if (likely(!ret)) { ... + blkg->online = true; } - blkg->online = true; On the failure path the blkg was never inserted into any list, and its blkg->pd[i]->online flags were not set either (those are in the same block). Leaving blkg->online = true marks a blkg as online that was never created -- inconsistent with pd[]->online and with blkg_destroy(), which clears blkg->online = false. Not observable today, since the failed blkg is on no list and unreachable by the online readers, but the flag should track the actual insertion. (This was sent to the cgroups list rather than linux-block, hence the overlap.) Thanks, Tao 在 2026/6/22 15:07, Zizhi Wo 写道: > When radix_tree_insert() fails in blkg_create(), the error path calls > blkg_put() to release the blkg. This was correct when blkg->refcnt was an > atomic_t: blkg_put() dropped it to 0 and triggered the release path. > > But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use > percpu_ref") switched refcnt to a percpu_ref. In percpu mode > percpu_ref_put() never checks for zero, so the release callback is never > invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so > blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call > blkg_destroy()->percpu_ref_kill() either, cause the leak. > > Fix it by killing the percpu_ref instead, which switches it to atomic mode > and drops the initial ref. > > Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref") > Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com> > Signed-off-by: Zizhi Wo <wozizhi@huawei.com> > --- > block/blk-cgroup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index bc63bd220865..6386fe413994 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk, > > if (!ret) > return blkg; > > /* @blkg failed fully initialized, use the usual release path */ > - blkg_put(blkg); > + percpu_ref_kill(&blkg->refcnt); > return ERR_PTR(ret); > > err_put_css: > css_put(&blkcg->css); > err_free_blkg: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] blk-cgroup: fix blkg leak in blkg_create() error path 2026-06-23 1:16 ` Tao Cui @ 2026-06-23 1:38 ` Zizhi Wo 0 siblings, 0 replies; 5+ messages in thread From: Zizhi Wo @ 2026-06-23 1:38 UTC (permalink / raw) To: Tao Cui, axboe, tj, josef, linux-block Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai 在 2026/6/23 9:16, Tao Cui 写道: > Hi Zizhi, > > Thanks for the patch. I ran into the same issue and posted a fix for it > earlier: > > https://lore.kernel.org/all/20260507061229.57466-1-cuitao@kylinos.cn/ > > The leak fix is identical to yours (blkg_put() -> percpu_ref_kill()), > plus one extra change: moving blkg->online = true into the success > block: > > if (likely(!ret)) { > ... > + blkg->online = true; > } > - blkg->online = true; > > On the failure path the blkg was never inserted into any list, and its > blkg->pd[i]->online flags were not set either (those are in the same > block). Leaving blkg->online = true marks a blkg as online that was > never created -- inconsistent with pd[]->online and with > blkg_destroy(), which clears blkg->online = false. Not observable > today, since the failed blkg is on no list and unreachable by the > online readers, but the flag should track the actual insertion. > > (This was sent to the cgroups list rather than linux-block, hence the > overlap.) > > Thanks, > Tao I'm not subscribed to the cgroup mailing list, so I didn't see that this issue had already been fixed. :( And indeed, your patch nicely updates blkg->online as well. — I hadn't realized that. Thanks for the heads-up! Thanks, Zizhi Wo > > 在 2026/6/22 15:07, Zizhi Wo 写道: >> When radix_tree_insert() fails in blkg_create(), the error path calls >> blkg_put() to release the blkg. This was correct when blkg->refcnt was an >> atomic_t: blkg_put() dropped it to 0 and triggered the release path. >> >> But commit 7fcf2b033b84 ("blkcg: change blkg reference counting to use >> percpu_ref") switched refcnt to a percpu_ref. In percpu mode >> percpu_ref_put() never checks for zero, so the release callback is never >> invoked. This blkg is on neither blkcg->blkg_list nor queue->blkg_list, so >> blkg_destroy_all() / blkcg_destroy_blkgs() can never reach it to call >> blkg_destroy()->percpu_ref_kill() either, cause the leak. >> >> Fix it by killing the percpu_ref instead, which switches it to atomic mode >> and drops the initial ref. >> >> Fixes: 7fcf2b033b84 ("blkcg: change blkg reference counting to use percpu_ref") >> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com> >> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> >> --- >> block/blk-cgroup.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >> index bc63bd220865..6386fe413994 100644 >> --- a/block/blk-cgroup.c >> +++ b/block/blk-cgroup.c >> @@ -437,11 +437,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk, >> >> if (!ret) >> return blkg; >> >> /* @blkg failed fully initialized, use the usual release path */ >> - blkg_put(blkg); >> + percpu_ref_kill(&blkg->refcnt); >> return ERR_PTR(ret); >> >> err_put_css: >> css_put(&blkcg->css); >> err_free_blkg: ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create error path 2026-06-22 7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo 2026-06-22 7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo @ 2026-06-22 7:07 ` Zizhi Wo 1 sibling, 0 replies; 5+ messages in thread From: Zizhi Wo @ 2026-06-22 7:07 UTC (permalink / raw) To: axboe, tj, josef, linux-block Cc: cgroups, yangerkun, chengzhihao1, houtao1, yukuai, wozizhi From: Zizhi Wo <wozizhi@huawei.com> When blkg_create() fails before the blkg is linked onto blkcg->blkg_list and q->blkg_list (e.g. radix_tree_insert() fails or the blkg_lookup() returns NULL), the blkg is freed asynchronously via blkg_free_workfn(). Since such a blkg was never linked, it is invisible to blkcg_deactivate_policy(), so its blkg->pd[] entries can not be cleared in it. blkg_free_workfn() then calls blkcg_policy->pd_free_fn() on them, which can race with bfq module exit (bfq_exit() -> blkcg_policy_unregister()) clearing the blkcg_policy[] slot, leading to a NULL pointer dereference: [ 72.597786] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 72.598690] CPU: 35 UID: 0 PID: 458 Comm: kworker/35:1 Not tainted 7.1.0+ #33 PREEMPT(full) [ 72.599518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014 [ 72.600342] Workqueue: events blkg_free_workfn [ 72.600991] RIP: 0010:blkg_free_workfn+0x115/0x3d0 ...... [ 72.613278] Call Trace: [ 72.613988] <TASK> [ 72.614357] process_one_work+0x6b4/0xff0 [ 72.615251] ? __pfx_blkg_free_workfn+0x10/0x10 [ 72.616041] ? assign_work+0x131/0x3f0 [ 72.616962] worker_thread+0x4eb/0xd50 [ 72.617599] ? __kthread_parkme+0x8d/0x170 [ 72.618565] ? __pfx_worker_thread+0x10/0x10 [ 72.619566] ? __pfx_worker_thread+0x10/0x10 [ 72.620213] kthread+0x327/0x410 ...... Fix this by introducing blkg_free_pd() to synchronously free the pd and clear blkg->pd[] in the blkg_create() error path, while the blkcg_policy is still valid. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> --- block/blk-cgroup.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 6386fe413994..673886d71c26 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -109,28 +109,37 @@ static struct cgroup_subsys_state *blkcg_css(void) if (css) return css; return task_css(current, io_cgrp_id); } +static void blkg_free_pd(struct blkcg_gq *blkg) +{ + int i; + + for (i = 0; i < BLKCG_MAX_POLS; i++) { + if (blkg->pd[i]) { + blkcg_policy[i]->pd_free_fn(blkg->pd[i]); + blkg->pd[i] = NULL; + } + } +} + static void blkg_free_workfn(struct work_struct *work) { struct blkcg_gq *blkg = container_of(work, struct blkcg_gq, free_work); struct request_queue *q = blkg->q; - int i; /* * pd_free_fn() can also be called from blkcg_deactivate_policy(), * in order to make sure pd_free_fn() is called in order, the deletion * of the list blkg->q_node is delayed to here from blkg_destroy(), and * blkcg_mutex is used to synchronize blkg_free_workfn() and * blkcg_deactivate_policy(). */ mutex_lock(&q->blkcg_mutex); - for (i = 0; i < BLKCG_MAX_POLS; i++) - if (blkg->pd[i]) - blkcg_policy[i]->pd_free_fn(blkg->pd[i]); + blkg_free_pd(blkg); if (blkg->parent) blkg_put(blkg->parent); spin_lock_irq(&q->queue_lock); list_del_init(&blkg->q_node); spin_unlock_irq(&q->queue_lock); @@ -436,19 +445,28 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk, spin_unlock(&blkcg->lock); if (!ret) return blkg; - /* @blkg failed fully initialized, use the usual release path */ + /* + * @blkg failed fully initialized and never linked, so its pd[] is + * invisible to blkcg_deactivate_policy(). Free pd[] synchronously + * while blkcg_policy[] is still valid, otherwise the async free path + * may call pd_free_fn() after the policy is unregistered (e.g. rmmod bfq). + * The err_free_blkg path below frees pd[] for the same reason. + */ + blkg_free_pd(blkg); percpu_ref_kill(&blkg->refcnt); return ERR_PTR(ret); err_put_css: css_put(&blkcg->css); err_free_blkg: - if (new_blkg) + if (new_blkg) { + blkg_free_pd(new_blkg); blkg_free(new_blkg); + } return ERR_PTR(ret); } /** * blkg_lookup_create - lookup blkg, try to create one if not there -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-23 1:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-22 7:07 [PATCH 0/2] fix two issues in blkg_create() error path Zizhi Wo 2026-06-22 7:07 ` [PATCH 1/2] blk-cgroup: fix blkg leak " Zizhi Wo 2026-06-23 1:16 ` Tao Cui 2026-06-23 1:38 ` Zizhi Wo 2026-06-22 7:07 ` [PATCH 2/2] blk-cgroup: fix null-ptr-deref by freeing blkg pd on blkg_create " Zizhi Wo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox