linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ublk_drv: fix request queue leak
@ 2022-07-14 10:32 Ming Lei
  2022-07-14 13:00 ` Jens Axboe
  2022-07-14 13:13 ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2022-07-14 10:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei

Call blk_cleanup_queue() in release code path for fixing request
queue leak.

Also for-5.20/block has cleaned up blk_cleanup_queue(), which is
basically merged to del_gendisk() if blk_mq_alloc_disk() is used
for allocating disk and queue.

However, ublk may not add disk in case of starting device failure, then
del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
will not be callsed, and it can be bit hard to deal with this kind of
merge conflict.

Turns out ublk's queue/disk use model is very similar with scsi, so switch
to scsi's model by allocating disk and queue independently, then it can be
quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
with blk_mq_destroy_queue.

Reported-by: Jens Axboe <axboe@kernel.dk>
Fixes: 3fee8d7599e1 ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 35fa06ee70ff..eeeac43e1dc1 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -155,6 +155,8 @@ static DEFINE_MUTEX(ublk_ctl_mutex);
 
 static struct miscdevice ublk_misc;
 
+static struct lock_class_key ublk_bio_compl_lkclass;
+
 static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq)
 {
 	if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) &&
@@ -634,7 +636,7 @@ static void ublk_commit_rqs(struct blk_mq_hw_ctx *hctx)
 static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
 		unsigned int hctx_idx)
 {
-	struct ublk_device *ub = hctx->queue->queuedata;
+	struct ublk_device *ub = driver_data;
 	struct ublk_queue *ubq = ublk_get_queue(ub, hctx->queue_num);
 
 	hctx->driver_data = ubq;
@@ -1076,6 +1078,8 @@ static void ublk_cdev_rel(struct device *dev)
 {
 	struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev);
 
+	blk_cleanup_queue(ub->ub_queue);
+
 	put_disk(ub->ub_disk);
 
 	blk_mq_free_tag_set(&ub->tag_set);
@@ -1165,14 +1169,17 @@ static int ublk_add_dev(struct ublk_device *ub)
 	if (err)
 		goto out_deinit_queues;
 
-	disk = ub->ub_disk = blk_mq_alloc_disk(&ub->tag_set, ub);
+	ub->ub_queue = blk_mq_init_queue(&ub->tag_set);
+	if (IS_ERR(ub->ub_queue))
+		goto out_cleanup_tags;
+	ub->ub_queue->queuedata = ub;
+
+	disk = ub->ub_disk = __alloc_disk_node(ub->ub_queue, NUMA_NO_NODE,
+			&ublk_bio_compl_lkclass);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
-		goto out_cleanup_tags;
+		goto out_free_request_queue;
 	}
-	ub->ub_queue = ub->ub_disk->queue;
-
-	ub->ub_queue->queuedata = ub;
 
 	blk_queue_logical_block_size(ub->ub_queue, bsize);
 	blk_queue_physical_block_size(ub->ub_queue, bsize);
@@ -1204,6 +1211,8 @@ static int ublk_add_dev(struct ublk_device *ub)
 
 	return 0;
 
+out_free_request_queue:
+	blk_cleanup_queue(ub->ub_queue);
 out_cleanup_tags:
 	blk_mq_free_tag_set(&ub->tag_set);
 out_deinit_queues:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 10:32 [PATCH] ublk_drv: fix request queue leak Ming Lei
@ 2022-07-14 13:00 ` Jens Axboe
  2022-07-14 13:10   ` Ming Lei
  2022-07-14 13:13 ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-07-14 13:00 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block

On 7/14/22 4:32 AM, Ming Lei wrote:
> Call blk_cleanup_queue() in release code path for fixing request
> queue leak.
> 
> Also for-5.20/block has cleaned up blk_cleanup_queue(), which is
> basically merged to del_gendisk() if blk_mq_alloc_disk() is used
> for allocating disk and queue.
> 
> However, ublk may not add disk in case of starting device failure, then
> del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
> will not be callsed, and it can be bit hard to deal with this kind of
> merge conflict.
> 
> Turns out ublk's queue/disk use model is very similar with scsi, so switch
> to scsi's model by allocating disk and queue independently, then it can be
> quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
> with blk_mq_destroy_queue.

Tried this with the below incremental added to make it compile with
the core block changes too, and it still fails for me:

[   22.488660] WARNING: CPU: 0 PID: 11 at block/blk-mq.c:3880 blk_mq_release+0xa4/0xf0
[   22.490797] Modules linked in:
[   22.491762] CPU: 0 PID: 11 Comm: kworker/0:1 Not tainted 5.19.0-rc6-00322-g42ed61fe42f3-dirty #1609
[   22.494659] Hardware name: linux,dummy-virt (DT)
[   22.496171] Workqueue: events blkg_free_workfn
[   22.497652] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   22.499965] pc : blk_mq_release+0xa4/0xf0
[   22.501386] lr : blk_mq_release+0x44/0xf0
[   22.502748] sp : ffff80000af73cb0
[   22.503880] x29: ffff80000af73cb0 x28: 0000000000000000 x27: 0000000000000000
[   22.506263] x26: 0000000000000000 x25: ffff00001fe47b05 x24: 0000000000000000
[   22.508655] x23: ffff0000052b6cb8 x22: ffff0000031e1c38 x21: 0000000000000000
[   22.511035] x20: ffff0000031e1cf0 x19: ffff0000031e1bf0 x18: 0000000000000000
[   22.513427] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffa8000b80
[   22.515814] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
[   22.518209] x11: ffff80000945b7e8 x10: 0000000000006cb9 x9 : 00000000ffffffff
[   22.520600] x8 : ffff800008fb5000 x7 : ffff80000860cf28 x6 : 0000000000000000
[   22.522987] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff80000af73c14
[   22.525363] x2 : ffff0000071ccaa8 x1 : ffff0000071ccaa8 x0 : ffff0000071cc800
[   22.527624] Call trace:
[   22.528473]  blk_mq_release+0xa4/0xf0
[   22.529724]  blk_release_queue+0x58/0xa0
[   22.530946]  kobject_put+0x84/0xe0
[   22.531821]  blk_put_queue+0x10/0x18
[   22.532716]  blkg_free_workfn+0x58/0x84
[   22.533681]  process_one_work+0x2ac/0x438
[   22.534872]  worker_thread+0x1cc/0x264
[   22.535829]  kthread+0xd0/0xe0
[   22.536598]  ret_from_fork+0x10/0x20


diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index eeeac43e1dc1..d818da818c00 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1078,7 +1078,7 @@ static void ublk_cdev_rel(struct device *dev)
 {
 	struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev);
 
-	blk_cleanup_queue(ub->ub_queue);
+	blk_put_queue(ub->ub_queue);
 
 	put_disk(ub->ub_disk);
 
@@ -1174,8 +1174,8 @@ static int ublk_add_dev(struct ublk_device *ub)
 		goto out_cleanup_tags;
 	ub->ub_queue->queuedata = ub;
 
-	disk = ub->ub_disk = __alloc_disk_node(ub->ub_queue, NUMA_NO_NODE,
-			&ublk_bio_compl_lkclass);
+	disk = ub->ub_disk = blk_mq_alloc_disk_for_queue(ub->ub_queue,
+						 &ublk_bio_compl_lkclass);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_free_request_queue;
@@ -1212,7 +1212,7 @@ static int ublk_add_dev(struct ublk_device *ub)
 	return 0;
 
 out_free_request_queue:
-	blk_cleanup_queue(ub->ub_queue);
+	blk_put_queue(ub->ub_queue);
 out_cleanup_tags:
 	blk_mq_free_tag_set(&ub->tag_set);
 out_deinit_queues:


-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:00 ` Jens Axboe
@ 2022-07-14 13:10   ` Ming Lei
  2022-07-14 13:14     ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-07-14 13:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, ming.lei

On Thu, Jul 14, 2022 at 07:00:59AM -0600, Jens Axboe wrote:
> On 7/14/22 4:32 AM, Ming Lei wrote:
> > Call blk_cleanup_queue() in release code path for fixing request
> > queue leak.
> > 
> > Also for-5.20/block has cleaned up blk_cleanup_queue(), which is
> > basically merged to del_gendisk() if blk_mq_alloc_disk() is used
> > for allocating disk and queue.
> > 
> > However, ublk may not add disk in case of starting device failure, then
> > del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
> > will not be callsed, and it can be bit hard to deal with this kind of
> > merge conflict.
> > 
> > Turns out ublk's queue/disk use model is very similar with scsi, so switch
> > to scsi's model by allocating disk and queue independently, then it can be
> > quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
> > with blk_mq_destroy_queue.
> 
> Tried this with the below incremental added to make it compile with
> the core block changes too, and it still fails for me:
> 
> [   22.488660] WARNING: CPU: 0 PID: 11 at block/blk-mq.c:3880 blk_mq_release+0xa4/0xf0
> [   22.490797] Modules linked in:
> [   22.491762] CPU: 0 PID: 11 Comm: kworker/0:1 Not tainted 5.19.0-rc6-00322-g42ed61fe42f3-dirty #1609
> [   22.494659] Hardware name: linux,dummy-virt (DT)
> [   22.496171] Workqueue: events blkg_free_workfn
> [   22.497652] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   22.499965] pc : blk_mq_release+0xa4/0xf0
> [   22.501386] lr : blk_mq_release+0x44/0xf0
> [   22.502748] sp : ffff80000af73cb0
> [   22.503880] x29: ffff80000af73cb0 x28: 0000000000000000 x27: 0000000000000000
> [   22.506263] x26: 0000000000000000 x25: ffff00001fe47b05 x24: 0000000000000000
> [   22.508655] x23: ffff0000052b6cb8 x22: ffff0000031e1c38 x21: 0000000000000000
> [   22.511035] x20: ffff0000031e1cf0 x19: ffff0000031e1bf0 x18: 0000000000000000
> [   22.513427] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffa8000b80
> [   22.515814] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
> [   22.518209] x11: ffff80000945b7e8 x10: 0000000000006cb9 x9 : 00000000ffffffff
> [   22.520600] x8 : ffff800008fb5000 x7 : ffff80000860cf28 x6 : 0000000000000000
> [   22.522987] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff80000af73c14
> [   22.525363] x2 : ffff0000071ccaa8 x1 : ffff0000071ccaa8 x0 : ffff0000071cc800
> [   22.527624] Call trace:
> [   22.528473]  blk_mq_release+0xa4/0xf0
> [   22.529724]  blk_release_queue+0x58/0xa0
> [   22.530946]  kobject_put+0x84/0xe0
> [   22.531821]  blk_put_queue+0x10/0x18
> [   22.532716]  blkg_free_workfn+0x58/0x84
> [   22.533681]  process_one_work+0x2ac/0x438
> [   22.534872]  worker_thread+0x1cc/0x264
> [   22.535829]  kthread+0xd0/0xe0
> [   22.536598]  ret_from_fork+0x10/0x20
> 
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index eeeac43e1dc1..d818da818c00 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1078,7 +1078,7 @@ static void ublk_cdev_rel(struct device *dev)
>  {
>  	struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev);
>  
> -	blk_cleanup_queue(ub->ub_queue);
> +	blk_put_queue(ub->ub_queue);

I guess you run test on for-next, and it should work by just replacing
two blk_cleanup_queue with blk_mq_destroy_queue().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 10:32 [PATCH] ublk_drv: fix request queue leak Ming Lei
  2022-07-14 13:00 ` Jens Axboe
@ 2022-07-14 13:13 ` Christoph Hellwig
  2022-07-14 13:20   ` Ming Lei
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-07-14 13:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 06:32:01PM +0800, Ming Lei wrote:
> However, ublk may not add disk in case of starting device failure, then
> del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
> will not be callsed, and it can be bit hard to deal with this kind of
> merge conflict.

So base it on a tree that has everything you need.

> Turns out ublk's queue/disk use model is very similar with scsi, so switch
> to scsi's model by allocating disk and queue independently, then it can be
> quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
> with blk_mq_destroy_queue.

Don't do that.  That thing really is a workaround for the lack of admin
queues in scsi.  Nothing newly designed should use it.  It will not
allow to optimize things and cause maintainaince burden down the road.

Please fix the lifetime problems properly.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:10   ` Ming Lei
@ 2022-07-14 13:14     ` Jens Axboe
  2022-07-14 13:24       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-07-14 13:14 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block

On 7/14/22 7:10 AM, Ming Lei wrote:
> On Thu, Jul 14, 2022 at 07:00:59AM -0600, Jens Axboe wrote:
>> On 7/14/22 4:32 AM, Ming Lei wrote:
>>> Call blk_cleanup_queue() in release code path for fixing request
>>> queue leak.
>>>
>>> Also for-5.20/block has cleaned up blk_cleanup_queue(), which is
>>> basically merged to del_gendisk() if blk_mq_alloc_disk() is used
>>> for allocating disk and queue.
>>>
>>> However, ublk may not add disk in case of starting device failure, then
>>> del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
>>> will not be callsed, and it can be bit hard to deal with this kind of
>>> merge conflict.
>>>
>>> Turns out ublk's queue/disk use model is very similar with scsi, so switch
>>> to scsi's model by allocating disk and queue independently, then it can be
>>> quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
>>> with blk_mq_destroy_queue.
>>
>> Tried this with the below incremental added to make it compile with
>> the core block changes too, and it still fails for me:
>>
>> [   22.488660] WARNING: CPU: 0 PID: 11 at block/blk-mq.c:3880 blk_mq_release+0xa4/0xf0
>> [   22.490797] Modules linked in:
>> [   22.491762] CPU: 0 PID: 11 Comm: kworker/0:1 Not tainted 5.19.0-rc6-00322-g42ed61fe42f3-dirty #1609
>> [   22.494659] Hardware name: linux,dummy-virt (DT)
>> [   22.496171] Workqueue: events blkg_free_workfn
>> [   22.497652] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   22.499965] pc : blk_mq_release+0xa4/0xf0
>> [   22.501386] lr : blk_mq_release+0x44/0xf0
>> [   22.502748] sp : ffff80000af73cb0
>> [   22.503880] x29: ffff80000af73cb0 x28: 0000000000000000 x27: 0000000000000000
>> [   22.506263] x26: 0000000000000000 x25: ffff00001fe47b05 x24: 0000000000000000
>> [   22.508655] x23: ffff0000052b6cb8 x22: ffff0000031e1c38 x21: 0000000000000000
>> [   22.511035] x20: ffff0000031e1cf0 x19: ffff0000031e1bf0 x18: 0000000000000000
>> [   22.513427] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffa8000b80
>> [   22.515814] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001
>> [   22.518209] x11: ffff80000945b7e8 x10: 0000000000006cb9 x9 : 00000000ffffffff
>> [   22.520600] x8 : ffff800008fb5000 x7 : ffff80000860cf28 x6 : 0000000000000000
>> [   22.522987] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff80000af73c14
>> [   22.525363] x2 : ffff0000071ccaa8 x1 : ffff0000071ccaa8 x0 : ffff0000071cc800
>> [   22.527624] Call trace:
>> [   22.528473]  blk_mq_release+0xa4/0xf0
>> [   22.529724]  blk_release_queue+0x58/0xa0
>> [   22.530946]  kobject_put+0x84/0xe0
>> [   22.531821]  blk_put_queue+0x10/0x18
>> [   22.532716]  blkg_free_workfn+0x58/0x84
>> [   22.533681]  process_one_work+0x2ac/0x438
>> [   22.534872]  worker_thread+0x1cc/0x264
>> [   22.535829]  kthread+0xd0/0xe0
>> [   22.536598]  ret_from_fork+0x10/0x20
>>
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index eeeac43e1dc1..d818da818c00 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -1078,7 +1078,7 @@ static void ublk_cdev_rel(struct device *dev)
>>  {
>>  	struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev);
>>  
>> -	blk_cleanup_queue(ub->ub_queue);
>> +	blk_put_queue(ub->ub_queue);
> 
> I guess you run test on for-next, and it should work by just replacing
> two blk_cleanup_queue with blk_mq_destroy_queue().

Ah yes, that does the trick. I think I'll migrate the driver to the core
branch instead to avoid these issues.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:13 ` Christoph Hellwig
@ 2022-07-14 13:20   ` Ming Lei
  2022-07-14 13:23     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-07-14 13:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 06:13:41AM -0700, Christoph Hellwig wrote:
> On Thu, Jul 14, 2022 at 06:32:01PM +0800, Ming Lei wrote:
> > However, ublk may not add disk in case of starting device failure, then
> > del_gendisk() won't be called when removing ublk device, so blk_mq_exit_queue
> > will not be callsed, and it can be bit hard to deal with this kind of
> > merge conflict.
> 
> So base it on a tree that has everything you need.
> 
> > Turns out ublk's queue/disk use model is very similar with scsi, so switch
> > to scsi's model by allocating disk and queue independently, then it can be
> > quite easy to handle v5.20 merge conflict by replacing blk_cleanup_queue
> > with blk_mq_destroy_queue.
> 
> Don't do that.  That thing really is a workaround for the lack of admin
> queues in scsi.  Nothing newly designed should use it.  It will not
> allow to optimize things and cause maintainaince burden down the road.

The problem is that you moved part of blk_cleanup_queue() into
del_gendisk().

Here, the issue Jens reproduced is that we don't add disk yet, so won't
call del_gendisk(). The queue & disk is allocated & initialized correctly.

Then how to do the part done by original blk_cleanup_queue() without calling
blk_mq_destroy_queue()?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:20   ` Ming Lei
@ 2022-07-14 13:23     ` Christoph Hellwig
  2022-07-14 13:26       ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-07-14 13:23 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 09:20:24PM +0800, Ming Lei wrote:
> The problem is that you moved part of blk_cleanup_queue() into
> del_gendisk().
> 
> Here, the issue Jens reproduced is that we don't add disk yet, so won't
> call del_gendisk(). The queue & disk is allocated & initialized correctly.
> 
> Then how to do the part done by original blk_cleanup_queue() without calling
> blk_mq_destroy_queue()?

What do you need to clean up?  put_disk is supposed to eventually
clean up everything allocated by blk_alloc_disk through disk_release.
If it fails to cleanup anything that is a bug we need to fix in the core
as it will affect all drivers.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:14     ` Jens Axboe
@ 2022-07-14 13:24       ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-07-14 13:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, linux-block

On Thu, Jul 14, 2022 at 07:14:52AM -0600, Jens Axboe wrote:
> >> -	blk_cleanup_queue(ub->ub_queue);
> >> +	blk_put_queue(ub->ub_queue);
> > 
> > I guess you run test on for-next, and it should work by just replacing
> > two blk_cleanup_queue with blk_mq_destroy_queue().
> 
> Ah yes, that does the trick. I think I'll migrate the driver to the core
> branch instead to avoid these issues.

Please drop it for now.  It's pretty clear it did not have enough
review yet.  I'll try to allocate some time today or tomorrow to
go through it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:23     ` Christoph Hellwig
@ 2022-07-14 13:26       ` Ming Lei
  2022-07-14 13:37         ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-07-14 13:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 06:23:12AM -0700, Christoph Hellwig wrote:
> On Thu, Jul 14, 2022 at 09:20:24PM +0800, Ming Lei wrote:
> > The problem is that you moved part of blk_cleanup_queue() into
> > del_gendisk().
> > 
> > Here, the issue Jens reproduced is that we don't add disk yet, so won't
> > call del_gendisk(). The queue & disk is allocated & initialized correctly.
> > 
> > Then how to do the part done by original blk_cleanup_queue() without calling
> > blk_mq_destroy_queue()?
> 
> What do you need to clean up?  put_disk is supposed to eventually
> clean up everything allocated by blk_alloc_disk through disk_release.
> If it fails to cleanup anything that is a bug we need to fix in the core
> as it will affect all drivers.

The part to be cleaned up is nothing to do with disk:

                if (queue_is_mq(q))
                        blk_mq_exit_queue(q);

->exit_hctx() is called in blk_mq_exit_queue().

Without calling blk_mq_destroy_queue, I don't see other way to address
this issue, or suggestions?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:26       ` Ming Lei
@ 2022-07-14 13:37         ` Ming Lei
  2022-07-14 13:55           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-07-14 13:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 09:26:25PM +0800, Ming Lei wrote:
> On Thu, Jul 14, 2022 at 06:23:12AM -0700, Christoph Hellwig wrote:
> > On Thu, Jul 14, 2022 at 09:20:24PM +0800, Ming Lei wrote:
> > > The problem is that you moved part of blk_cleanup_queue() into
> > > del_gendisk().
> > > 
> > > Here, the issue Jens reproduced is that we don't add disk yet, so won't
> > > call del_gendisk(). The queue & disk is allocated & initialized correctly.
> > > 
> > > Then how to do the part done by original blk_cleanup_queue() without calling
> > > blk_mq_destroy_queue()?
> > 
> > What do you need to clean up?  put_disk is supposed to eventually
> > clean up everything allocated by blk_alloc_disk through disk_release.
> > If it fails to cleanup anything that is a bug we need to fix in the core
> > as it will affect all drivers.
> 
> The part to be cleaned up is nothing to do with disk:
> 
>                 if (queue_is_mq(q))
>                         blk_mq_exit_queue(q);
> 
> ->exit_hctx() is called in blk_mq_exit_queue().
> 
> Without calling blk_mq_destroy_queue, I don't see other way to address
> this issue, or suggestions?

It is actually one big problem of 6f8191fdf41d ("block: simplify disk shutdown")
since blk_put_queue() can't do what blk_cleanup_queue() did.

Anywhere using blk_put_queue() to release blk-mq queue before adding
disk has the same issue.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:37         ` Ming Lei
@ 2022-07-14 13:55           ` Christoph Hellwig
  2022-07-14 14:02             ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-07-14 13:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Thu, Jul 14, 2022 at 09:37:10PM +0800, Ming Lei wrote:
> It is actually one big problem of 6f8191fdf41d ("block: simplify disk shutdown")
> since blk_put_queue() can't do what blk_cleanup_queue() did.
> 
> Anywhere using blk_put_queue() to release blk-mq queue before adding
> disk has the same issue.

And the reason why blk_put_queue can't do is seems to be mostly because
queues don't hold a reference on the tag set (and tag_sets don't have
a reference at all).  Which has caused us a bunch of issues before, so
let me see if I can fix that properly.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ublk_drv: fix request queue leak
  2022-07-14 13:55           ` Christoph Hellwig
@ 2022-07-14 14:02             ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-07-14 14:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, ming.lei

On Thu, Jul 14, 2022 at 06:55:28AM -0700, Christoph Hellwig wrote:
> On Thu, Jul 14, 2022 at 09:37:10PM +0800, Ming Lei wrote:
> > It is actually one big problem of 6f8191fdf41d ("block: simplify disk shutdown")
> > since blk_put_queue() can't do what blk_cleanup_queue() did.
> > 
> > Anywhere using blk_put_queue() to release blk-mq queue before adding
> > disk has the same issue.
> 
> And the reason why blk_put_queue can't do is seems to be mostly because
> queues don't hold a reference on the tag set (and tag_sets don't have

Exactly.

> a reference at all).  Which has caused us a bunch of issues before, so
> let me see if I can fix that properly.

I guess it is hard to fix, not get any idea yet. Originally we stop to
referring tagset after blk_cleanup_queue(), but now you are trying to
kill blk_cleanup_queue()...


thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-14 14:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-14 10:32 [PATCH] ublk_drv: fix request queue leak Ming Lei
2022-07-14 13:00 ` Jens Axboe
2022-07-14 13:10   ` Ming Lei
2022-07-14 13:14     ` Jens Axboe
2022-07-14 13:24       ` Christoph Hellwig
2022-07-14 13:13 ` Christoph Hellwig
2022-07-14 13:20   ` Ming Lei
2022-07-14 13:23     ` Christoph Hellwig
2022-07-14 13:26       ` Ming Lei
2022-07-14 13:37         ` Ming Lei
2022-07-14 13:55           ` Christoph Hellwig
2022-07-14 14:02             ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).