[PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
@ 2026-03-11  3:28 Ming Lei
  2026-03-11  9:39 ` Christoph Hellwig
  2026-03-11 14:30 ` Jens Axboe
  0 siblings, 2 replies; 3+ messages in thread
From: Ming Lei @ 2026-03-11  3:28 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei, Yi Zhang

When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.

Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.

Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
	- take wake_up_var()/wait_var_event(), suggested by Christoph

 block/blk-cgroup.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b70096497d38..2d7b18eb7291 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -24,6 +24,7 @@
 #include <linux/backing-dev.h>
 #include <linux/slab.h>
 #include <linux/delay.h>
+#include <linux/wait_bit.h>
 #include <linux/atomic.h>
 #include <linux/ctype.h>
 #include <linux/resume_user_mode.h>
@@ -611,6 +612,8 @@ static void blkg_destroy_all(struct gendisk *disk)
 
 	q->root_blkg = NULL;
 	spin_unlock_irq(&q->queue_lock);
+
+	wake_up_var(&q->root_blkg);
 }
 
 static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
@@ -1498,6 +1501,18 @@ int blkcg_init_disk(struct gendisk *disk)
 	struct blkcg_gq *new_blkg, *blkg;
 	bool preloaded;
 
+	/*
+	 * If the queue is shared across disk rebind (e.g., SCSI), the
+	 * previous disk's blkcg state is cleaned up asynchronously via
+	 * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to
+	 * finish (indicated by root_blkg becoming NULL) before setting up
+	 * new blkcg state. Otherwise, we may overwrite q->root_blkg while
+	 * the old one is still alive, and radix_tree_insert() in
+	 * blkg_create() will fail with -EEXIST because the old entries
+	 * still occupy the same queue id slot in blkcg->blkg_tree.
+	 */
+	wait_var_event(&q->root_blkg, !READ_ONCE(q->root_blkg));
+
 	new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
 	if (!new_blkg)
 		return -ENOMEM;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
  2026-03-11  3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
@ 2026-03-11  9:39 ` Christoph Hellwig
  2026-03-11 14:30 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2026-03-11  9:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Yi Zhang

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
  2026-03-11  3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
  2026-03-11  9:39 ` Christoph Hellwig
@ 2026-03-11 14:30 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2026-03-11 14:30 UTC (permalink / raw)
  To: linux-block, Ming Lei; +Cc: Yi Zhang


On Wed, 11 Mar 2026 11:28:37 +0800, Ming Lei wrote:
> When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
> previous disk's blkcg state is cleaned up asynchronously via
> disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
> runs before that cleanup finishes, we may overwrite q->root_blkg while
> the old one is still alive, and radix_tree_insert() in blkg_create()
> fails with -EEXIST because the old blkg entries still occupy the same
> queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
> with -ENOMEM.
> 
> [...]

Applied, thanks!

[1/1] blk-cgroup: wait for blkcg cleanup before initializing new disk
      commit: 3dbaacf6ab68f81e3375fe769a2ecdbd3ce386fd

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-11 14:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11  3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-11  9:39 ` Christoph Hellwig
2026-03-11 14:30 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox