* [PATCH] block: fix deadlock between sd_remove & sd_release
@ 2024-07-16 8:38 Yang Yang
2024-07-16 9:15 ` Yu Kuai
0 siblings, 1 reply; 7+ messages in thread
From: Yang Yang @ 2024-07-16 8:38 UTC (permalink / raw)
To: Jens Axboe, linux-block, linux-kernel; +Cc: Yang Yang
Our test report the following hung task:
[ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
[ 2538.459427] Call trace:
[ 2538.459430] __switch_to+0x174/0x338
[ 2538.459436] __schedule+0x628/0x9c4
[ 2538.459442] schedule+0x7c/0xe8
[ 2538.459447] schedule_preempt_disabled+0x24/0x40
[ 2538.459453] __mutex_lock+0x3ec/0xf04
[ 2538.459456] __mutex_lock_slowpath+0x14/0x24
[ 2538.459459] mutex_lock+0x30/0xd8
[ 2538.459462] del_gendisk+0xdc/0x350
[ 2538.459466] sd_remove+0x30/0x60
[ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
[ 2538.459474] device_release_driver+0x18/0x28
[ 2538.459478] bus_remove_device+0x15c/0x174
[ 2538.459483] device_del+0x1d0/0x358
[ 2538.459488] __scsi_remove_device+0xa8/0x198
[ 2538.459493] scsi_forget_host+0x50/0x70
[ 2538.459497] scsi_remove_host+0x80/0x180
[ 2538.459502] usb_stor_disconnect+0x68/0xf4
[ 2538.459506] usb_unbind_interface+0xd4/0x280
[ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
[ 2538.459514] device_release_driver+0x18/0x28
[ 2538.459518] bus_remove_device+0x15c/0x174
[ 2538.459523] device_del+0x1d0/0x358
[ 2538.459528] usb_disable_device+0x84/0x194
[ 2538.459532] usb_disconnect+0xec/0x300
[ 2538.459537] hub_event+0xb80/0x1870
[ 2538.459541] process_scheduled_works+0x248/0x4dc
[ 2538.459545] worker_thread+0x244/0x334
[ 2538.459549] kthread+0x114/0x1bc
[ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
[ 2538.461014] Call trace:
[ 2538.461016] __switch_to+0x174/0x338
[ 2538.461021] __schedule+0x628/0x9c4
[ 2538.461025] schedule+0x7c/0xe8
[ 2538.461030] blk_queue_enter+0xc4/0x160
[ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
[ 2538.461037] scsi_execute_cmd+0x7c/0x23c
[ 2538.461040] ioctl_internal_command+0x5c/0x164
[ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
[ 2538.461051] sd_release+0x50/0x94
[ 2538.461054] blkdev_put+0x190/0x28c
[ 2538.461058] blkdev_release+0x28/0x40
[ 2538.461063] __fput+0xf8/0x2a8
[ 2538.461066] __fput_sync+0x28/0x5c
[ 2538.461070] __arm64_sys_close+0x84/0xe8
[ 2538.461073] invoke_syscall+0x58/0x114
[ 2538.461078] el0_svc_common+0xac/0xe0
[ 2538.461082] do_el0_svc+0x1c/0x28
[ 2538.461087] el0_svc+0x38/0x68
[ 2538.461090] el0t_64_sync_handler+0x68/0xbc
[ 2538.461093] el0t_64_sync+0x1a8/0x1ac
T1: T2:
sd_remove
del_gendisk
__blk_mark_disk_dead
blk_freeze_queue_start
++q->mq_freeze_depth
bdev_release
mutex_lock(&disk->open_mutex)
sd_release
scsi_execute_cmd
blk_queue_enter
wait_event(!q->mq_freeze_depth)
mutex_lock(&disk->open_mutex)
This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
try to acquire disk->open_mutex after freezing the queue.
Signed-off-by: Yang Yang <yang.yang@vivo.com>
---
block/genhd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index 8f1f3c6b4d67..c5fca3e893a0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
*/
if (!test_bit(GD_DEAD, &disk->state))
blk_report_disk_dead(disk, false);
- __blk_mark_disk_dead(disk);
/*
* Drop all partitions now that the disk is marked dead.
*/
mutex_lock(&disk->open_mutex);
+ __blk_mark_disk_dead(disk);
xa_for_each_start(&disk->part_tbl, idx, part, 1)
drop_partition(part);
mutex_unlock(&disk->open_mutex);
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-16 8:38 [PATCH] block: fix deadlock between sd_remove & sd_release Yang Yang
@ 2024-07-16 9:15 ` Yu Kuai
2024-07-16 9:30 ` YangYang
0 siblings, 1 reply; 7+ messages in thread
From: Yu Kuai @ 2024-07-16 9:15 UTC (permalink / raw)
To: Yang Yang, Jens Axboe, linux-block, linux-kernel, yukuai (C)
Hi,
在 2024/07/16 16:38, Yang Yang 写道:
> Our test report the following hung task:
>
> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
> [ 2538.459427] Call trace:
> [ 2538.459430] __switch_to+0x174/0x338
> [ 2538.459436] __schedule+0x628/0x9c4
> [ 2538.459442] schedule+0x7c/0xe8
> [ 2538.459447] schedule_preempt_disabled+0x24/0x40
> [ 2538.459453] __mutex_lock+0x3ec/0xf04
> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24
> [ 2538.459459] mutex_lock+0x30/0xd8
> [ 2538.459462] del_gendisk+0xdc/0x350
> [ 2538.459466] sd_remove+0x30/0x60
> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
> [ 2538.459474] device_release_driver+0x18/0x28
> [ 2538.459478] bus_remove_device+0x15c/0x174
> [ 2538.459483] device_del+0x1d0/0x358
> [ 2538.459488] __scsi_remove_device+0xa8/0x198
> [ 2538.459493] scsi_forget_host+0x50/0x70
> [ 2538.459497] scsi_remove_host+0x80/0x180
> [ 2538.459502] usb_stor_disconnect+0x68/0xf4
> [ 2538.459506] usb_unbind_interface+0xd4/0x280
> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
> [ 2538.459514] device_release_driver+0x18/0x28
> [ 2538.459518] bus_remove_device+0x15c/0x174
> [ 2538.459523] device_del+0x1d0/0x358
> [ 2538.459528] usb_disable_device+0x84/0x194
> [ 2538.459532] usb_disconnect+0xec/0x300
> [ 2538.459537] hub_event+0xb80/0x1870
> [ 2538.459541] process_scheduled_works+0x248/0x4dc
> [ 2538.459545] worker_thread+0x244/0x334
> [ 2538.459549] kthread+0x114/0x1bc
>
> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
> [ 2538.461014] Call trace:
> [ 2538.461016] __switch_to+0x174/0x338
> [ 2538.461021] __schedule+0x628/0x9c4
> [ 2538.461025] schedule+0x7c/0xe8
> [ 2538.461030] blk_queue_enter+0xc4/0x160
> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c
> [ 2538.461040] ioctl_internal_command+0x5c/0x164
> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
> [ 2538.461051] sd_release+0x50/0x94
> [ 2538.461054] blkdev_put+0x190/0x28c
> [ 2538.461058] blkdev_release+0x28/0x40
> [ 2538.461063] __fput+0xf8/0x2a8
> [ 2538.461066] __fput_sync+0x28/0x5c
> [ 2538.461070] __arm64_sys_close+0x84/0xe8
> [ 2538.461073] invoke_syscall+0x58/0x114
> [ 2538.461078] el0_svc_common+0xac/0xe0
> [ 2538.461082] do_el0_svc+0x1c/0x28
> [ 2538.461087] el0_svc+0x38/0x68
> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc
> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac
>
> T1: T2:
> sd_remove
> del_gendisk
> __blk_mark_disk_dead
> blk_freeze_queue_start
> ++q->mq_freeze_depth
> bdev_release
> mutex_lock(&disk->open_mutex)
> sd_release
> scsi_execute_cmd
> blk_queue_enter
> wait_event(!q->mq_freeze_depth)
This looks wrong, there is a condition blk_queue_dying() in
blk_queue_enter().
Thanks,
Kuai
> mutex_lock(&disk->open_mutex)
>
> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
> try to acquire disk->open_mutex after freezing the queue.
>
> Signed-off-by: Yang Yang <yang.yang@vivo.com>
> ---
> block/genhd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 8f1f3c6b4d67..c5fca3e893a0 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
> */
> if (!test_bit(GD_DEAD, &disk->state))
> blk_report_disk_dead(disk, false);
> - __blk_mark_disk_dead(disk);
>
> /*
> * Drop all partitions now that the disk is marked dead.
> */
> mutex_lock(&disk->open_mutex);
> + __blk_mark_disk_dead(disk);
> xa_for_each_start(&disk->part_tbl, idx, part, 1)
> drop_partition(part);
> mutex_unlock(&disk->open_mutex);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-16 9:15 ` Yu Kuai
@ 2024-07-16 9:30 ` YangYang
2024-07-17 3:53 ` Yu Kuai
0 siblings, 1 reply; 7+ messages in thread
From: YangYang @ 2024-07-16 9:30 UTC (permalink / raw)
To: Yu Kuai, yukuai (C); +Cc: Jens Axboe, linux-block, linux-kernel
On 2024/7/16 17:15, Yu Kuai wrote:
> Hi,
>
> 在 2024/07/16 16:38, Yang Yang 写道:
>> Our test report the following hung task:
>>
>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
>> [ 2538.459427] Call trace:
>> [ 2538.459430] __switch_to+0x174/0x338
>> [ 2538.459436] __schedule+0x628/0x9c4
>> [ 2538.459442] schedule+0x7c/0xe8
>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40
>> [ 2538.459453] __mutex_lock+0x3ec/0xf04
>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24
>> [ 2538.459459] mutex_lock+0x30/0xd8
>> [ 2538.459462] del_gendisk+0xdc/0x350
>> [ 2538.459466] sd_remove+0x30/0x60
>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
>> [ 2538.459474] device_release_driver+0x18/0x28
>> [ 2538.459478] bus_remove_device+0x15c/0x174
>> [ 2538.459483] device_del+0x1d0/0x358
>> [ 2538.459488] __scsi_remove_device+0xa8/0x198
>> [ 2538.459493] scsi_forget_host+0x50/0x70
>> [ 2538.459497] scsi_remove_host+0x80/0x180
>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4
>> [ 2538.459506] usb_unbind_interface+0xd4/0x280
>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
>> [ 2538.459514] device_release_driver+0x18/0x28
>> [ 2538.459518] bus_remove_device+0x15c/0x174
>> [ 2538.459523] device_del+0x1d0/0x358
>> [ 2538.459528] usb_disable_device+0x84/0x194
>> [ 2538.459532] usb_disconnect+0xec/0x300
>> [ 2538.459537] hub_event+0xb80/0x1870
>> [ 2538.459541] process_scheduled_works+0x248/0x4dc
>> [ 2538.459545] worker_thread+0x244/0x334
>> [ 2538.459549] kthread+0x114/0x1bc
>>
>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
>> [ 2538.461014] Call trace:
>> [ 2538.461016] __switch_to+0x174/0x338
>> [ 2538.461021] __schedule+0x628/0x9c4
>> [ 2538.461025] schedule+0x7c/0xe8
>> [ 2538.461030] blk_queue_enter+0xc4/0x160
>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c
>> [ 2538.461040] ioctl_internal_command+0x5c/0x164
>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
>> [ 2538.461051] sd_release+0x50/0x94
>> [ 2538.461054] blkdev_put+0x190/0x28c
>> [ 2538.461058] blkdev_release+0x28/0x40
>> [ 2538.461063] __fput+0xf8/0x2a8
>> [ 2538.461066] __fput_sync+0x28/0x5c
>> [ 2538.461070] __arm64_sys_close+0x84/0xe8
>> [ 2538.461073] invoke_syscall+0x58/0x114
>> [ 2538.461078] el0_svc_common+0xac/0xe0
>> [ 2538.461082] do_el0_svc+0x1c/0x28
>> [ 2538.461087] el0_svc+0x38/0x68
>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc
>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac
>>
>> T1: T2:
>> sd_remove
>> del_gendisk
>> __blk_mark_disk_dead
>> blk_freeze_queue_start
>> ++q->mq_freeze_depth
>> bdev_release
>> mutex_lock(&disk->open_mutex)
>> sd_release
>> scsi_execute_cmd
>> blk_queue_enter
>> wait_event(!q->mq_freeze_depth)
>
> This looks wrong, there is a condition blk_queue_dying() in
> blk_queue_enter().
584 static void __blk_mark_disk_dead(struct gendisk *disk)
585 {
......
591
592 if (test_bit(GD_OWNS_QUEUE, &disk->state))
593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
this scenario.
Thanks.
>
> Thanks,
> Kuai
>
>> mutex_lock(&disk->open_mutex)
>>
>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
>> try to acquire disk->open_mutex after freezing the queue.
>>
>> Signed-off-by: Yang Yang <yang.yang@vivo.com>
>> ---
>> block/genhd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 8f1f3c6b4d67..c5fca3e893a0 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
>> */
>> if (!test_bit(GD_DEAD, &disk->state))
>> blk_report_disk_dead(disk, false);
>> - __blk_mark_disk_dead(disk);
>> /*
>> * Drop all partitions now that the disk is marked dead.
>> */
>> mutex_lock(&disk->open_mutex);
>> + __blk_mark_disk_dead(disk);
>> xa_for_each_start(&disk->part_tbl, idx, part, 1)
>> drop_partition(part);
>> mutex_unlock(&disk->open_mutex);
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-16 9:30 ` YangYang
@ 2024-07-17 3:53 ` Yu Kuai
2024-07-17 10:15 ` YangYang
0 siblings, 1 reply; 7+ messages in thread
From: Yu Kuai @ 2024-07-17 3:53 UTC (permalink / raw)
To: YangYang, Yu Kuai; +Cc: Jens Axboe, linux-block, linux-kernel, yukuai (C)
Hi,
在 2024/07/16 17:30, YangYang 写道:
> On 2024/7/16 17:15, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/07/16 16:38, Yang Yang 写道:
>>> Our test report the following hung task:
>>>
>>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188
>>> seconds.
>>> [ 2538.459427] Call trace:
>>> [ 2538.459430] __switch_to+0x174/0x338
>>> [ 2538.459436] __schedule+0x628/0x9c4
>>> [ 2538.459442] schedule+0x7c/0xe8
>>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40
>>> [ 2538.459453] __mutex_lock+0x3ec/0xf04
>>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24
>>> [ 2538.459459] mutex_lock+0x30/0xd8
>>> [ 2538.459462] del_gendisk+0xdc/0x350
>>> [ 2538.459466] sd_remove+0x30/0x60
>>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
>>> [ 2538.459474] device_release_driver+0x18/0x28
>>> [ 2538.459478] bus_remove_device+0x15c/0x174
>>> [ 2538.459483] device_del+0x1d0/0x358
>>> [ 2538.459488] __scsi_remove_device+0xa8/0x198
>>> [ 2538.459493] scsi_forget_host+0x50/0x70
>>> [ 2538.459497] scsi_remove_host+0x80/0x180
>>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4
>>> [ 2538.459506] usb_unbind_interface+0xd4/0x280
>>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
>>> [ 2538.459514] device_release_driver+0x18/0x28
>>> [ 2538.459518] bus_remove_device+0x15c/0x174
>>> [ 2538.459523] device_del+0x1d0/0x358
>>> [ 2538.459528] usb_disable_device+0x84/0x194
>>> [ 2538.459532] usb_disconnect+0xec/0x300
>>> [ 2538.459537] hub_event+0xb80/0x1870
>>> [ 2538.459541] process_scheduled_works+0x248/0x4dc
>>> [ 2538.459545] worker_thread+0x244/0x334
>>> [ 2538.459549] kthread+0x114/0x1bc
>>>
>>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188
>>> seconds.
>>> [ 2538.461014] Call trace:
>>> [ 2538.461016] __switch_to+0x174/0x338
>>> [ 2538.461021] __schedule+0x628/0x9c4
>>> [ 2538.461025] schedule+0x7c/0xe8
>>> [ 2538.461030] blk_queue_enter+0xc4/0x160
>>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
>>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c
>>> [ 2538.461040] ioctl_internal_command+0x5c/0x164
>>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
>>> [ 2538.461051] sd_release+0x50/0x94
>>> [ 2538.461054] blkdev_put+0x190/0x28c
>>> [ 2538.461058] blkdev_release+0x28/0x40
>>> [ 2538.461063] __fput+0xf8/0x2a8
>>> [ 2538.461066] __fput_sync+0x28/0x5c
>>> [ 2538.461070] __arm64_sys_close+0x84/0xe8
>>> [ 2538.461073] invoke_syscall+0x58/0x114
>>> [ 2538.461078] el0_svc_common+0xac/0xe0
>>> [ 2538.461082] do_el0_svc+0x1c/0x28
>>> [ 2538.461087] el0_svc+0x38/0x68
>>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc
>>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac
>>>
>>> T1: T2:
>>> sd_remove
>>> del_gendisk
>>> __blk_mark_disk_dead
>>> blk_freeze_queue_start
>>> ++q->mq_freeze_depth
>>> bdev_release
>>> mutex_lock(&disk->open_mutex)
>>> sd_release
>>> scsi_execute_cmd
>>> blk_queue_enter
>>> wait_event(!q->mq_freeze_depth)
>>
>> This looks wrong, there is a condition blk_queue_dying() in
>> blk_queue_enter().
>
> 584 static void __blk_mark_disk_dead(struct gendisk *disk)
> 585 {
> ......
> 591
> 592 if (test_bit(GD_OWNS_QUEUE, &disk->state))
> 593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
>
> SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
> this scenario.
Yes, you're right. Please explain this in commit message as well.
>
> Thanks.
>
>>
>> Thanks,
>> Kuai
>>
>>> mutex_lock(&disk->open_mutex)
>>>
>>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
>>> try to acquire disk->open_mutex after freezing the queue.
>>>
>>> Signed-off-by: Yang Yang <yang.yang@vivo.com>
>>> ---
>>> block/genhd.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/genhd.c b/block/genhd.c
>>> index 8f1f3c6b4d67..c5fca3e893a0 100644
>>> --- a/block/genhd.c
>>> +++ b/block/genhd.c
>>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
>>> */
>>> if (!test_bit(GD_DEAD, &disk->state))
>>> blk_report_disk_dead(disk, false);
>>> - __blk_mark_disk_dead(disk);
>>> /*
>>> * Drop all partitions now that the disk is marked dead.
>>> */
>>> mutex_lock(&disk->open_mutex);
>>> + __blk_mark_disk_dead(disk);
>>> xa_for_each_start(&disk->part_tbl, idx, part, 1)
>>> drop_partition(part);
>>> mutex_unlock(&disk->open_mutex);
Take a look at del_gendisk(), between freeze and unfreeze queue, I
notice that there is also device_del() here, which means it will wait
for all sysfs readers/writers to be done, so related sysfs api can't
issue IO to trigger blk_queue_enter(). And this behaviour does't look
reasonable, we never forbit this.
Then take a look at scsi sysfs api, I notice that scsi_rescan_device()
can be called and issue IO. Hence other than the 'open_mutex', same
deadlock still exist:
t1: t2
store_state_field
scsi_rescan_device
scsi_attach_vpd
scsi_get_vpd_buf
scsi_vpd_inquiry
scsi_execute_cmd
del_gendisk
bdev_unhash
blk_freeze_queue_start
blk_queue_enter
device_del
kobject_del
sysfs_remove_dir
I didn't test this, just by code review, and I could be wrong. And
I don't check all the procedures between freeze and unfreeze yet.
Thanks,
Kuai
>>>
>>
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-17 3:53 ` Yu Kuai
@ 2024-07-17 10:15 ` YangYang
2024-07-17 16:23 ` Bart Van Assche
0 siblings, 1 reply; 7+ messages in thread
From: YangYang @ 2024-07-17 10:15 UTC (permalink / raw)
To: Yu Kuai; +Cc: Jens Axboe, linux-block, linux-kernel, yukuai (C)
On 2024/7/17 11:53, Yu Kuai wrote:
> Hi,
>
> 在 2024/07/16 17:30, YangYang 写道:
>> On 2024/7/16 17:15, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2024/07/16 16:38, Yang Yang 写道:
>>>> Our test report the following hung task:
>>>>
>>>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
>>>> [ 2538.459427] Call trace:
>>>> [ 2538.459430] __switch_to+0x174/0x338
>>>> [ 2538.459436] __schedule+0x628/0x9c4
>>>> [ 2538.459442] schedule+0x7c/0xe8
>>>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40
>>>> [ 2538.459453] __mutex_lock+0x3ec/0xf04
>>>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24
>>>> [ 2538.459459] mutex_lock+0x30/0xd8
>>>> [ 2538.459462] del_gendisk+0xdc/0x350
>>>> [ 2538.459466] sd_remove+0x30/0x60
>>>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
>>>> [ 2538.459474] device_release_driver+0x18/0x28
>>>> [ 2538.459478] bus_remove_device+0x15c/0x174
>>>> [ 2538.459483] device_del+0x1d0/0x358
>>>> [ 2538.459488] __scsi_remove_device+0xa8/0x198
>>>> [ 2538.459493] scsi_forget_host+0x50/0x70
>>>> [ 2538.459497] scsi_remove_host+0x80/0x180
>>>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4
>>>> [ 2538.459506] usb_unbind_interface+0xd4/0x280
>>>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
>>>> [ 2538.459514] device_release_driver+0x18/0x28
>>>> [ 2538.459518] bus_remove_device+0x15c/0x174
>>>> [ 2538.459523] device_del+0x1d0/0x358
>>>> [ 2538.459528] usb_disable_device+0x84/0x194
>>>> [ 2538.459532] usb_disconnect+0xec/0x300
>>>> [ 2538.459537] hub_event+0xb80/0x1870
>>>> [ 2538.459541] process_scheduled_works+0x248/0x4dc
>>>> [ 2538.459545] worker_thread+0x244/0x334
>>>> [ 2538.459549] kthread+0x114/0x1bc
>>>>
>>>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
>>>> [ 2538.461014] Call trace:
>>>> [ 2538.461016] __switch_to+0x174/0x338
>>>> [ 2538.461021] __schedule+0x628/0x9c4
>>>> [ 2538.461025] schedule+0x7c/0xe8
>>>> [ 2538.461030] blk_queue_enter+0xc4/0x160
>>>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
>>>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c
>>>> [ 2538.461040] ioctl_internal_command+0x5c/0x164
>>>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
>>>> [ 2538.461051] sd_release+0x50/0x94
>>>> [ 2538.461054] blkdev_put+0x190/0x28c
>>>> [ 2538.461058] blkdev_release+0x28/0x40
>>>> [ 2538.461063] __fput+0xf8/0x2a8
>>>> [ 2538.461066] __fput_sync+0x28/0x5c
>>>> [ 2538.461070] __arm64_sys_close+0x84/0xe8
>>>> [ 2538.461073] invoke_syscall+0x58/0x114
>>>> [ 2538.461078] el0_svc_common+0xac/0xe0
>>>> [ 2538.461082] do_el0_svc+0x1c/0x28
>>>> [ 2538.461087] el0_svc+0x38/0x68
>>>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc
>>>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac
>>>>
>>>> T1: T2:
>>>> sd_remove
>>>> del_gendisk
>>>> __blk_mark_disk_dead
>>>> blk_freeze_queue_start
>>>> ++q->mq_freeze_depth
>>>> bdev_release
>>>> mutex_lock(&disk->open_mutex)
>>>> sd_release
>>>> scsi_execute_cmd
>>>> blk_queue_enter
>>>> wait_event(!q->mq_freeze_depth)
>>>
>>> This looks wrong, there is a condition blk_queue_dying() in
>>> blk_queue_enter().
>>
>> 584 static void __blk_mark_disk_dead(struct gendisk *disk)
>> 585 {
>> ......
>> 591
>> 592 if (test_bit(GD_OWNS_QUEUE, &disk->state))
>> 593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
>>
>> SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
>> this scenario.
>
> Yes, you're right. Please explain this in commit message as well.
Got it.
>>
>> Thanks.
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>>> mutex_lock(&disk->open_mutex)
>>>>
>>>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
>>>> try to acquire disk->open_mutex after freezing the queue.
>>>>
>>>> Signed-off-by: Yang Yang <yang.yang@vivo.com>
>>>> ---
>>>> block/genhd.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/genhd.c b/block/genhd.c
>>>> index 8f1f3c6b4d67..c5fca3e893a0 100644
>>>> --- a/block/genhd.c
>>>> +++ b/block/genhd.c
>>>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
>>>> */
>>>> if (!test_bit(GD_DEAD, &disk->state))
>>>> blk_report_disk_dead(disk, false);
>>>> - __blk_mark_disk_dead(disk);
>>>> /*
>>>> * Drop all partitions now that the disk is marked dead.
>>>> */
>>>> mutex_lock(&disk->open_mutex);
>>>> + __blk_mark_disk_dead(disk);
>>>> xa_for_each_start(&disk->part_tbl, idx, part, 1)
>>>> drop_partition(part);
>>>> mutex_unlock(&disk->open_mutex);
>
> Take a look at del_gendisk(), between freeze and unfreeze queue, I
> notice that there is also device_del() here, which means it will wait
> for all sysfs readers/writers to be done, so related sysfs api can't
> issue IO to trigger blk_queue_enter(). And this behaviour does't look
> reasonable, we never forbit this.
>
> Then take a look at scsi sysfs api, I notice that scsi_rescan_device()
> can be called and issue IO. Hence other than the 'open_mutex', same
> deadlock still exist:
>
> t1: t2
> store_state_field
> scsi_rescan_device
> scsi_attach_vpd
> scsi_get_vpd_buf
> scsi_vpd_inquiry
> scsi_execute_cmd
> del_gendisk
> bdev_unhash
> blk_freeze_queue_start
> blk_queue_enter
> device_del
> kobject_del
> sysfs_remove_dir
>
> I didn't test this, just by code review, and I could be wrong. And
> I don't check all the procedures between freeze and unfreeze yet.
These sysfs nodes are in different directories, the scsi node located
at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at
/sys/block/sda. Would it be necessary to wait for the completion of
the scsi sysfs nodes' read/write operations before removing the
gendisk sysfs node?
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-17 10:15 ` YangYang
@ 2024-07-17 16:23 ` Bart Van Assche
2024-07-19 3:15 ` Yu Kuai
0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2024-07-17 16:23 UTC (permalink / raw)
To: YangYang, Yu Kuai; +Cc: Jens Axboe, linux-block, linux-kernel, yukuai (C)
On 7/17/24 3:15 AM, YangYang wrote:
> These sysfs nodes are in different directories, the scsi node located
> at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at
> /sys/block/sda. Would it be necessary to wait for the completion of
> the scsi sysfs nodes' read/write operations before removing the
> gendisk sysfs node?
No. sysfs_remove_files() waits for pending read and write operations to
complete.
Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: fix deadlock between sd_remove & sd_release
2024-07-17 16:23 ` Bart Van Assche
@ 2024-07-19 3:15 ` Yu Kuai
0 siblings, 0 replies; 7+ messages in thread
From: Yu Kuai @ 2024-07-19 3:15 UTC (permalink / raw)
To: Bart Van Assche, YangYang, Yu Kuai
Cc: Jens Axboe, linux-block, linux-kernel, yukuai (C)
在 2024/07/18 0:23, Bart Van Assche 写道:
> On 7/17/24 3:15 AM, YangYang wrote:
>> These sysfs nodes are in different directories, the scsi node located
>> at /sys/bus/scsi/devices/0:0:0:0 and the gendisk node located at
>> /sys/block/sda. Would it be necessary to wait for the completion of
>> the scsi sysfs nodes' read/write operations before removing the
>> gendisk sysfs node?
>
> No. sysfs_remove_files() waits for pending read and write operations to
> complete.
>
So I check this and actually gendisk kobject is a child of scsi disk
kobject, not the other way. sda/device is just a symbol link.
sda/device -> ../../../14:2:0:0.
So, the sysfs api for gendisk itself doesn't issue IO, and as long as
related driver doesn't create new such api under gendisk, this won't
be a problem.
Thanks,
Kuai
> Bart.
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-07-19 3:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16 8:38 [PATCH] block: fix deadlock between sd_remove & sd_release Yang Yang
2024-07-16 9:15 ` Yu Kuai
2024-07-16 9:30 ` YangYang
2024-07-17 3:53 ` Yu Kuai
2024-07-17 10:15 ` YangYang
2024-07-17 16:23 ` Bart Van Assche
2024-07-19 3:15 ` Yu Kuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).