* [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
@ 2026-03-09 10:09 Ming Lei
2026-03-10 13:20 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2026-03-09 10:09 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei, Yi Zhang, Tejun Heo
When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.
Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.
Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-cgroup.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b70096497d38..7aa2ed7f7c82 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1498,6 +1498,27 @@ int blkcg_init_disk(struct gendisk *disk)
struct blkcg_gq *new_blkg, *blkg;
bool preloaded;
+ /*
+ * If the queue is shared across disk rebind (e.g., SCSI), the
+ * previous disk's blkcg state is cleaned up asynchronously via
+ * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to
+ * finish (indicated by root_blkg becoming NULL) before setting up
+ * new blkcg state. Otherwise, we may overwrite q->root_blkg while
+ * the old one is still alive, and radix_tree_insert() in
+ * blkg_create() will fail with -EEXIST because the old entries
+ * still occupy the same queue id slot in blkcg->blkg_tree.
+ */
+ if (READ_ONCE(q->root_blkg)) {
+ /* 20s is a random timeout, disk_release() should be done well before */
+ unsigned long end = jiffies + msecs_to_jiffies(20000);
+
+ while (READ_ONCE(q->root_blkg) &&
+ time_before(jiffies, end))
+ msleep(1);
+ if (READ_ONCE(q->root_blkg))
+ return -EEXIST;
+ }
+
new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
if (!new_blkg)
return -ENOMEM;
--
2.47.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
2026-03-09 10:09 [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
@ 2026-03-10 13:20 ` Christoph Hellwig
2026-03-11 2:13 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2026-03-10 13:20 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yi Zhang, Tejun Heo
On Mon, Mar 09, 2026 at 06:09:06PM +0800, Ming Lei wrote:
> + if (READ_ONCE(q->root_blkg)) {
> + /* 20s is a random timeout, disk_release() should be done well before */
> + unsigned long end = jiffies + msecs_to_jiffies(20000);
> +
> + while (READ_ONCE(q->root_blkg) &&
> + time_before(jiffies, end))
> + msleep(1);
> + if (READ_ONCE(q->root_blkg))
> + return -EEXIST;
Random sleeps are almost never a good idea. If you don't want to waste
a waitqueue on this use wake_up_var/wait_on_var, but make it properly
wait for an event.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
2026-03-10 13:20 ` Christoph Hellwig
@ 2026-03-11 2:13 ` Ming Lei
0 siblings, 0 replies; 3+ messages in thread
From: Ming Lei @ 2026-03-11 2:13 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yi Zhang, Tejun Heo
On Tue, Mar 10, 2026 at 06:20:25AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 09, 2026 at 06:09:06PM +0800, Ming Lei wrote:
> > + if (READ_ONCE(q->root_blkg)) {
> > + /* 20s is a random timeout, disk_release() should be done well before */
> > + unsigned long end = jiffies + msecs_to_jiffies(20000);
> > +
> > + while (READ_ONCE(q->root_blkg) &&
> > + time_before(jiffies, end))
> > + msleep(1);
> > + if (READ_ONCE(q->root_blkg))
> > + return -EEXIST;
>
> Random sleeps are almost never a good idea. If you don't want to waste
> a waitqueue on this use wake_up_var/wait_on_var, but make it properly
> wait for an event.
The two wait variable helpers looks pretty handy, will take it in V2.
Thanks,
Ming
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-11 2:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 10:09 [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-10 13:20 ` Christoph Hellwig
2026-03-11 2:13 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox