* [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
@ 2026-03-11 3:28 Ming Lei
2026-03-11 9:39 ` Christoph Hellwig
2026-03-11 14:30 ` Jens Axboe
0 siblings, 2 replies; 3+ messages in thread
From: Ming Lei @ 2026-03-11 3:28 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: Ming Lei, Yi Zhang
When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.
Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.
Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
- take wake_up_var()/wait_var_event(), suggested by Christoph
block/blk-cgroup.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b70096497d38..2d7b18eb7291 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -24,6 +24,7 @@
#include <linux/backing-dev.h>
#include <linux/slab.h>
#include <linux/delay.h>
+#include <linux/wait_bit.h>
#include <linux/atomic.h>
#include <linux/ctype.h>
#include <linux/resume_user_mode.h>
@@ -611,6 +612,8 @@ static void blkg_destroy_all(struct gendisk *disk)
q->root_blkg = NULL;
spin_unlock_irq(&q->queue_lock);
+
+ wake_up_var(&q->root_blkg);
}
static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src)
@@ -1498,6 +1501,18 @@ int blkcg_init_disk(struct gendisk *disk)
struct blkcg_gq *new_blkg, *blkg;
bool preloaded;
+ /*
+ * If the queue is shared across disk rebind (e.g., SCSI), the
+ * previous disk's blkcg state is cleaned up asynchronously via
+ * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to
+ * finish (indicated by root_blkg becoming NULL) before setting up
+ * new blkcg state. Otherwise, we may overwrite q->root_blkg while
+ * the old one is still alive, and radix_tree_insert() in
+ * blkg_create() will fail with -EEXIST because the old entries
+ * still occupy the same queue id slot in blkcg->blkg_tree.
+ */
+ wait_var_event(&q->root_blkg, !READ_ONCE(q->root_blkg));
+
new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
if (!new_blkg)
return -ENOMEM;
--
2.47.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
2026-03-11 3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
@ 2026-03-11 9:39 ` Christoph Hellwig
2026-03-11 14:30 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2026-03-11 9:39 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yi Zhang
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk
2026-03-11 3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-11 9:39 ` Christoph Hellwig
@ 2026-03-11 14:30 ` Jens Axboe
1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2026-03-11 14:30 UTC (permalink / raw)
To: linux-block, Ming Lei; +Cc: Yi Zhang
On Wed, 11 Mar 2026 11:28:37 +0800, Ming Lei wrote:
> When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
> previous disk's blkcg state is cleaned up asynchronously via
> disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
> runs before that cleanup finishes, we may overwrite q->root_blkg while
> the old one is still alive, and radix_tree_insert() in blkg_create()
> fails with -EEXIST because the old blkg entries still occupy the same
> queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
> with -ENOMEM.
>
> [...]
Applied, thanks!
[1/1] blk-cgroup: wait for blkcg cleanup before initializing new disk
commit: 3dbaacf6ab68f81e3375fe769a2ecdbd3ce386fd
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-11 14:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 3:28 [PATCH V2] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-11 9:39 ` Christoph Hellwig
2026-03-11 14:30 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox