public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
@ 2026-03-09 10:09 Ming Lei
  2026-03-10 13:20 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2026-03-09 10:09 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei, Yi Zhang, Tejun Heo

When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.

Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.

Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-cgroup.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b70096497d38..7aa2ed7f7c82 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1498,6 +1498,27 @@ int blkcg_init_disk(struct gendisk *disk)
 	struct blkcg_gq *new_blkg, *blkg;
 	bool preloaded;
 
+	/*
+	 * If the queue is shared across disk rebind (e.g., SCSI), the
+	 * previous disk's blkcg state is cleaned up asynchronously via
+	 * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to
+	 * finish (indicated by root_blkg becoming NULL) before setting up
+	 * new blkcg state. Otherwise, we may overwrite q->root_blkg while
+	 * the old one is still alive, and radix_tree_insert() in
+	 * blkg_create() will fail with -EEXIST because the old entries
+	 * still occupy the same queue id slot in blkcg->blkg_tree.
+	 */
+	if (READ_ONCE(q->root_blkg)) {
+		/* 20s is a random timeout, disk_release() should be done well before */
+		unsigned long end = jiffies + msecs_to_jiffies(20000);
+
+		while (READ_ONCE(q->root_blkg) &&
+				time_before(jiffies, end))
+			msleep(1);
+		if (READ_ONCE(q->root_blkg))
+			return -EEXIST;
+	}
+
 	new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
 	if (!new_blkg)
 		return -ENOMEM;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-11  2:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 10:09 [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-10 13:20 ` Christoph Hellwig
2026-03-11  2:13   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox