[PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
@ 2026-03-09 10:09 Ming Lei
  2026-03-10 13:20 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Ming Lei @ 2026-03-09 10:09 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Ming Lei, Yi Zhang, Tejun Heo

When a queue is shared across disk rebind (e.g., SCSI unbind/bind), the
previous disk's blkcg state is cleaned up asynchronously via
disk_release() -> blkcg_exit_disk(). If the new disk's blkcg_init_disk()
runs before that cleanup finishes, we may overwrite q->root_blkg while
the old one is still alive, and radix_tree_insert() in blkg_create()
fails with -EEXIST because the old blkg entries still occupy the same
queue id slot in blkcg->blkg_tree. This causes the sd probe to fail
with -ENOMEM.

Fix it by waiting in blkcg_init_disk() for root_blkg to become NULL,
which indicates the previous disk's blkcg cleanup has completed.

Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-cgroup.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b70096497d38..7aa2ed7f7c82 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1498,6 +1498,27 @@ int blkcg_init_disk(struct gendisk *disk)
 	struct blkcg_gq *new_blkg, *blkg;
 	bool preloaded;
 
+	/*
+	 * If the queue is shared across disk rebind (e.g., SCSI), the
+	 * previous disk's blkcg state is cleaned up asynchronously via
+	 * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to
+	 * finish (indicated by root_blkg becoming NULL) before setting up
+	 * new blkcg state. Otherwise, we may overwrite q->root_blkg while
+	 * the old one is still alive, and radix_tree_insert() in
+	 * blkg_create() will fail with -EEXIST because the old entries
+	 * still occupy the same queue id slot in blkcg->blkg_tree.
+	 */
+	if (READ_ONCE(q->root_blkg)) {
+		/* 20s is a random timeout, disk_release() should be done well before */
+		unsigned long end = jiffies + msecs_to_jiffies(20000);
+
+		while (READ_ONCE(q->root_blkg) &&
+				time_before(jiffies, end))
+			msleep(1);
+		if (READ_ONCE(q->root_blkg))
+			return -EEXIST;
+	}
+
 	new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
 	if (!new_blkg)
 		return -ENOMEM;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
  2026-03-09 10:09 [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
@ 2026-03-10 13:20 ` Christoph Hellwig
  2026-03-11  2:13   ` Ming Lei
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2026-03-10 13:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Yi Zhang, Tejun Heo

On Mon, Mar 09, 2026 at 06:09:06PM +0800, Ming Lei wrote:
> +	if (READ_ONCE(q->root_blkg)) {
> +		/* 20s is a random timeout, disk_release() should be done well before */
> +		unsigned long end = jiffies + msecs_to_jiffies(20000);
> +
> +		while (READ_ONCE(q->root_blkg) &&
> +				time_before(jiffies, end))
> +			msleep(1);
> +		if (READ_ONCE(q->root_blkg))
> +			return -EEXIST;

Random sleeps are almost never a good idea.  If you don't want to waste
a waitqueue on this use wake_up_var/wait_on_var, but make it properly
wait for an event.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk
  2026-03-10 13:20 ` Christoph Hellwig
@ 2026-03-11  2:13   ` Ming Lei
  0 siblings, 0 replies; 3+ messages in thread
From: Ming Lei @ 2026-03-11  2:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yi Zhang, Tejun Heo

On Tue, Mar 10, 2026 at 06:20:25AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 09, 2026 at 06:09:06PM +0800, Ming Lei wrote:
> > +	if (READ_ONCE(q->root_blkg)) {
> > +		/* 20s is a random timeout, disk_release() should be done well before */
> > +		unsigned long end = jiffies + msecs_to_jiffies(20000);
> > +
> > +		while (READ_ONCE(q->root_blkg) &&
> > +				time_before(jiffies, end))
> > +			msleep(1);
> > +		if (READ_ONCE(q->root_blkg))
> > +			return -EEXIST;
> 
> Random sleeps are almost never a good idea.  If you don't want to waste
> a waitqueue on this use wake_up_var/wait_on_var, but make it properly
> wait for an event.

The two wait variable helpers looks pretty handy, will take it in V2.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-11  2:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 10:09 [PATCH] blk-cgroup: wait for blkcg cleanup before initializing new disk Ming Lei
2026-03-10 13:20 ` Christoph Hellwig
2026-03-11  2:13   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox