* try to avoid del_gendisk vs passthrough from ->release deadlocks v2
@ 2024-10-09 11:38 Christoph Hellwig
2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Christoph Hellwig @ 2024-10-09 11:38 UTC (permalink / raw)
To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block
Hi all,
this is my attempted fix for the problem reported by Sergey in the
"block: del_gendisk() vs blk_queue_enter() race condition" thread. As
I don't have a reproducer this is all just best guest so far, so handle
it with care!
Changes since v1:
- clear the resurrect flag as well at the end of del_gendisk
Diffstat
block/genhd.c | 42 ++++++++++++++++++++++++++++--------------
include/linux/blkdev.h | 1 +
2 files changed, 29 insertions(+), 14 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread* [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig @ 2024-10-09 11:38 ` Christoph Hellwig 2024-10-09 12:31 ` Sergey Senozhatsky ` (3 more replies) 2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig 2024-10-16 2:09 ` try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Sergey Senozhatsky 2 siblings, 4 replies; 28+ messages in thread From: Christoph Hellwig @ 2024-10-09 11:38 UTC (permalink / raw) To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block When del_gendisk shuts down access to a gendisk, it could lead to a deadlock with sd or, which try to submit passthrough SCSI commands from their ->release method under open_mutex. The submission can be blocked in blk_enter_queue while del_gendisk can't get to actually telling them top stop and wake them up. As the disk is going away there is no real point in sending these commands, but we have no really good way to distinguish between the cases. For now mark even standalone (aka SCSI queues) as dying in del_gendisk to avoid this deadlock, but the real fix will be to split freeing a disk from freezing a queue for not disk associated requests. Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- block/genhd.c | 16 ++++++++++++++-- include/linux/blkdev.h | 1 + 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 1c05dd4c6980b5..7026569fa8a0be 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk) if (test_and_set_bit(GD_DEAD, &disk->state)) return; - if (test_bit(GD_OWNS_QUEUE, &disk->state)) - blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); + /* + * Also mark the disk dead if it is not owned by the gendisk. This + * means we can't allow /dev/sg passthrough or SCSI internal commands + * while unbinding a ULP. That is more than just a bit ugly, but until + * we untangle q_usage_counter into one owned by the disk and one owned + * by the queue this is as good as it gets. The flag will be cleared + * at the end of del_gendisk if it wasn't set before. + */ + if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags)) + set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags); /* * Stop buffered writers from dirtying pages that can't be written out. @@ -719,6 +727,10 @@ void del_gendisk(struct gendisk *disk) * again. Else leave the queue frozen to fail all I/O. */ if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) { + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); + clear_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags); + } blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); __blk_mq_unfreeze_queue(q, true); } else { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50c3b959da2816..391e3eb3bb5e61 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -590,6 +590,7 @@ struct request_queue { /* Keep blk_queue_flag_name[] in sync with the definitions below */ enum { QUEUE_FLAG_DYING, /* queue being torn down */ + QUEUE_FLAG_RESURRECT, /* temporarily dying */ QUEUE_FLAG_NOMERGES, /* disable merge attempts */ QUEUE_FLAG_SAME_COMP, /* complete on same CPU-group */ QUEUE_FLAG_FAIL_IO, /* fake timeout */ -- 2.45.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig @ 2024-10-09 12:31 ` Sergey Senozhatsky 2024-10-09 12:41 ` Christoph Hellwig 2024-10-16 4:14 ` YangYang ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-09 12:31 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On (24/10/09 13:38), Christoph Hellwig wrote: [..] > @@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk) > if (test_and_set_bit(GD_DEAD, &disk->state)) > return; > > - if (test_bit(GD_OWNS_QUEUE, &disk->state)) > - blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); > + /* > + * Also mark the disk dead if it is not owned by the gendisk. This > + * means we can't allow /dev/sg passthrough or SCSI internal commands > + * while unbinding a ULP. That is more than just a bit ugly, but until > + * we untangle q_usage_counter into one owned by the disk and one owned > + * by the queue this is as good as it gets. The flag will be cleared > + * at the end of del_gendisk if it wasn't set before. > + */ > + if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags)) > + set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags); > > /* > * Stop buffered writers from dirtying pages that can't be written out. > @@ -719,6 +727,10 @@ void del_gendisk(struct gendisk *disk) > * again. Else leave the queue frozen to fail all I/O. > */ > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) { > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); > + clear_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags); > + } Christoph, shouldn't QUEUE_FLAG_RESURRECT handling be outside of GD_OWNS_QUEUE if-block? Because __blk_mark_disk_dead() sets QUEUE_FLAG_DYING/QUEUE_FLAG_RESURRECT regardless of GD_OWNS_QUEUE. // A silly nit: it seems the code uses blk_queue_flag_set() and // blk_queue_flag_clear() helpers, but there is no queue_flag_test(), // I don't know what if the preference here - stick to queue_flag // helpers, or is it ok to mix them. > blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); > __blk_mq_unfreeze_queue(q, true); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 12:31 ` Sergey Senozhatsky @ 2024-10-09 12:41 ` Christoph Hellwig 2024-10-09 12:43 ` Sergey Senozhatsky 2024-10-09 13:49 ` Jens Axboe 0 siblings, 2 replies; 28+ messages in thread From: Christoph Hellwig @ 2024-10-09 12:41 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Wed, Oct 09, 2024 at 09:31:23PM +0900, Sergey Senozhatsky wrote: > > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) { > > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); > > + clear_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags); > > + } > > Christoph, shouldn't QUEUE_FLAG_RESURRECT handling be outside of > GD_OWNS_QUEUE if-block? Because __blk_mark_disk_dead() sets > QUEUE_FLAG_DYING/QUEUE_FLAG_RESURRECT regardless of GD_OWNS_QUEUE. For !GD_OWNS_QUEUE the queue is freed right below, so there isn't much of a point. > // A silly nit: it seems the code uses blk_queue_flag_set() and > // blk_queue_flag_clear() helpers, but there is no queue_flag_test(), > // I don't know what if the preference here - stick to queue_flag > // helpers, or is it ok to mix them. Yeah. I looked into a test_and_set wrapper, but then saw how pointless the existing wrappers are. So for now this just open codes it, and once we're done with the fixes I plan to just send a patch to remove the wrappers entirely. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 12:41 ` Christoph Hellwig @ 2024-10-09 12:43 ` Sergey Senozhatsky 2024-10-09 13:49 ` Jens Axboe 1 sibling, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-09 12:43 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Sergey Senozhatsky, Jens Axboe, YangYang, linux-block On (24/10/09 14:41), Christoph Hellwig wrote: > On Wed, Oct 09, 2024 at 09:31:23PM +0900, Sergey Senozhatsky wrote: > > > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > > > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) { > > > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); > > > + clear_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags); > > > + } > > > > Christoph, shouldn't QUEUE_FLAG_RESURRECT handling be outside of > > GD_OWNS_QUEUE if-block? Because __blk_mark_disk_dead() sets > > QUEUE_FLAG_DYING/QUEUE_FLAG_RESURRECT regardless of GD_OWNS_QUEUE. > > For !GD_OWNS_QUEUE the queue is freed right below, so there isn't much > of a point. Oh, right. > > // A silly nit: it seems the code uses blk_queue_flag_set() and > > // blk_queue_flag_clear() helpers, but there is no queue_flag_test(), > > // I don't know what if the preference here - stick to queue_flag > > // helpers, or is it ok to mix them. > > Yeah. I looked into a test_and_set wrapper, but then saw how pointless > the existing wrappers are. Likewise. > So for now this just open codes it, and once we're done with the fixes > I plan to just send a patch to remove the wrappers entirely. Ack. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 12:41 ` Christoph Hellwig 2024-10-09 12:43 ` Sergey Senozhatsky @ 2024-10-09 13:49 ` Jens Axboe 1 sibling, 0 replies; 28+ messages in thread From: Jens Axboe @ 2024-10-09 13:49 UTC (permalink / raw) To: Christoph Hellwig, Sergey Senozhatsky; +Cc: YangYang, linux-block On 10/9/24 6:41 AM, Christoph Hellwig wrote: >> // A silly nit: it seems the code uses blk_queue_flag_set() and >> // blk_queue_flag_clear() helpers, but there is no queue_flag_test(), >> // I don't know what if the preference here - stick to queue_flag >> // helpers, or is it ok to mix them. > > Yeah. I looked into a test_and_set wrapper, but then saw how pointless > the existing wrappers are. So for now this just open codes it, and > once we're done with the fixes I plan to just send a patch to remove > the wrappers entirely. Agree, but that's because you didn't do it back when you changed them to be just set/clear bit operations ;-). They should definitely just go away now. -- Jens Axboe ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 2024-10-09 12:31 ` Sergey Senozhatsky @ 2024-10-16 4:14 ` YangYang 2024-10-16 11:09 ` Ming Lei 2024-10-16 13:35 ` Ming Lei 3 siblings, 0 replies; 28+ messages in thread From: YangYang @ 2024-10-16 4:14 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe; +Cc: Sergey Senozhatsky, linux-block On 2024/10/9 19:38, Christoph Hellwig wrote: > When del_gendisk shuts down access to a gendisk, it could lead to a > deadlock with sd or, which try to submit passthrough SCSI commands from > their ->release method under open_mutex. The submission can be blocked > in blk_enter_queue while del_gendisk can't get to actually telling them > top stop and wake them up. > > As the disk is going away there is no real point in sending these > commands, but we have no really good way to distinguish between the > cases. For now mark even standalone (aka SCSI queues) as dying in > del_gendisk to avoid this deadlock, but the real fix will be to split > freeing a disk from freezing a queue for not disk associated requests. > > Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > block/genhd.c | 16 ++++++++++++++-- > include/linux/blkdev.h | 1 + > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 1c05dd4c6980b5..7026569fa8a0be 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk) > if (test_and_set_bit(GD_DEAD, &disk->state)) > return; > > - if (test_bit(GD_OWNS_QUEUE, &disk->state)) > - blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); > + /* > + * Also mark the disk dead if it is not owned by the gendisk. This > + * means we can't allow /dev/sg passthrough or SCSI internal commands > + * while unbinding a ULP. That is more than just a bit ugly, but until > + * we untangle q_usage_counter into one owned by the disk and one owned > + * by the queue this is as good as it gets. The flag will be cleared > + * at the end of del_gendisk if it wasn't set before. > + */ > + if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags)) > + set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags); > > /* > * Stop buffered writers from dirtying pages that can't be written out. > @@ -719,6 +727,10 @@ void del_gendisk(struct gendisk *disk) > * again. Else leave the queue frozen to fail all I/O. > */ > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) { > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); > + clear_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags); > + } > blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); > __blk_mq_unfreeze_queue(q, true); > } else { > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 50c3b959da2816..391e3eb3bb5e61 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -590,6 +590,7 @@ struct request_queue { > /* Keep blk_queue_flag_name[] in sync with the definitions below */ > enum { > QUEUE_FLAG_DYING, /* queue being torn down */ > + QUEUE_FLAG_RESURRECT, /* temporarily dying */ > QUEUE_FLAG_NOMERGES, /* disable merge attempts */ > QUEUE_FLAG_SAME_COMP, /* complete on same CPU-group */ > QUEUE_FLAG_FAIL_IO, /* fake timeout */ Looks good. Feel free to add: Reviewed-by: Yang Yang <yang.yang@vivo.com> Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 2024-10-09 12:31 ` Sergey Senozhatsky 2024-10-16 4:14 ` YangYang @ 2024-10-16 11:09 ` Ming Lei 2024-10-16 12:32 ` Christoph Hellwig 2024-10-16 13:35 ` Ming Lei 3 siblings, 1 reply; 28+ messages in thread From: Ming Lei @ 2024-10-16 11:09 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On Wed, Oct 09, 2024 at 01:38:20PM +0200, Christoph Hellwig wrote: > When del_gendisk shuts down access to a gendisk, it could lead to a > deadlock with sd or, which try to submit passthrough SCSI commands from > their ->release method under open_mutex. The submission can be blocked > in blk_enter_queue while del_gendisk can't get to actually telling them > top stop and wake them up. > > As the disk is going away there is no real point in sending these > commands, but we have no really good way to distinguish between the > cases. For now mark even standalone (aka SCSI queues) as dying in > del_gendisk to avoid this deadlock, but the real fix will be to split > freeing a disk from freezing a queue for not disk associated requests. > > Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > block/genhd.c | 16 ++++++++++++++-- > include/linux/blkdev.h | 1 + > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 1c05dd4c6980b5..7026569fa8a0be 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk) > if (test_and_set_bit(GD_DEAD, &disk->state)) > return; > > - if (test_bit(GD_OWNS_QUEUE, &disk->state)) > - blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); > + /* > + * Also mark the disk dead if it is not owned by the gendisk. This > + * means we can't allow /dev/sg passthrough or SCSI internal commands > + * while unbinding a ULP. That is more than just a bit ugly, but until > + * we untangle q_usage_counter into one owned by the disk and one owned > + * by the queue this is as good as it gets. The flag will be cleared > + * at the end of del_gendisk if it wasn't set before. > + */ > + if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags)) > + set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags); Setting QUEUE_FLAG_DYING may fail passthrough request for !GD_OWNS_QUEUE, I guess this may cause SCSI regression. blk_queue_enter() need to wait until RESURRECT & DYING are cleared instead of returning failure. Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-16 11:09 ` Ming Lei @ 2024-10-16 12:32 ` Christoph Hellwig 2024-10-16 12:49 ` Ming Lei 0 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2024-10-16 12:32 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On Wed, Oct 16, 2024 at 07:09:48PM +0800, Ming Lei wrote: > Setting QUEUE_FLAG_DYING may fail passthrough request for > !GD_OWNS_QUEUE, I guess this may cause SCSI regression. Yes, as clearly documented in the commit log. > > blk_queue_enter() need to wait until RESURRECT & DYING are cleared > instead of returning failure. What we really need to is to split the enter conditions between disk and standalone queue. But until then I think the current version is reasonable enough. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-16 12:32 ` Christoph Hellwig @ 2024-10-16 12:49 ` Ming Lei 0 siblings, 0 replies; 28+ messages in thread From: Ming Lei @ 2024-10-16 12:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block, linux-scsi, Martin K. Petersen On Wed, Oct 16, 2024 at 02:32:40PM +0200, Christoph Hellwig wrote: > On Wed, Oct 16, 2024 at 07:09:48PM +0800, Ming Lei wrote: > > Setting QUEUE_FLAG_DYING may fail passthrough request for > > !GD_OWNS_QUEUE, I guess this may cause SCSI regression. > > Yes, as clearly documented in the commit log. The change need Cc linux-scsi. > As the disk is going away there is no real point in sending these > commands, but we have no really good way to distinguish between the > cases. scsi request queue has very different lifetime with gendisk, not sure the above comment is correct. Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig ` (2 preceding siblings ...) 2024-10-16 11:09 ` Ming Lei @ 2024-10-16 13:35 ` Ming Lei 2024-10-19 1:25 ` Sergey Senozhatsky 3 siblings, 1 reply; 28+ messages in thread From: Ming Lei @ 2024-10-16 13:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On Wed, Oct 09, 2024 at 01:38:20PM +0200, Christoph Hellwig wrote: > When del_gendisk shuts down access to a gendisk, it could lead to a > deadlock with sd or, which try to submit passthrough SCSI commands from > their ->release method under open_mutex. The submission can be blocked > in blk_enter_queue while del_gendisk can't get to actually telling them > top stop and wake them up. When ->release() waits in blk_enter_queue(), the following code block mutex_lock(&disk->open_mutex); __blk_mark_disk_dead(disk); xa_for_each_start(&disk->part_tbl, idx, part, 1) drop_partition(part); mutex_unlock(&disk->open_mutex); in del_gendisk() should have been done. Then del_gendisk() should move on and finally unfreeze queue, so I still don't get the idea how the above dead lock is triggered. Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-16 13:35 ` Ming Lei @ 2024-10-19 1:25 ` Sergey Senozhatsky 2024-10-19 12:32 ` Ming Lei 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 1:25 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On (24/10/16 21:35), Ming Lei wrote: > On Wed, Oct 09, 2024 at 01:38:20PM +0200, Christoph Hellwig wrote: > > When del_gendisk shuts down access to a gendisk, it could lead to a > > deadlock with sd or, which try to submit passthrough SCSI commands from > > their ->release method under open_mutex. The submission can be blocked > > in blk_enter_queue while del_gendisk can't get to actually telling them > > top stop and wake them up. > > When ->release() waits in blk_enter_queue(), the following code block > > mutex_lock(&disk->open_mutex); > __blk_mark_disk_dead(disk); > xa_for_each_start(&disk->part_tbl, idx, part, 1) > drop_partition(part); > mutex_unlock(&disk->open_mutex); blk_enter_queue()->schedule() holds ->open_mutex, so that block of code sleeps on ->open_mutex. We can't drain under ->open_mutex. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 1:25 ` Sergey Senozhatsky @ 2024-10-19 12:32 ` Ming Lei 2024-10-19 12:37 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Ming Lei @ 2024-10-19 12:32 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Sat, Oct 19, 2024 at 10:25:41AM +0900, Sergey Senozhatsky wrote: > On (24/10/16 21:35), Ming Lei wrote: > > On Wed, Oct 09, 2024 at 01:38:20PM +0200, Christoph Hellwig wrote: > > > When del_gendisk shuts down access to a gendisk, it could lead to a > > > deadlock with sd or, which try to submit passthrough SCSI commands from > > > their ->release method under open_mutex. The submission can be blocked > > > in blk_enter_queue while del_gendisk can't get to actually telling them > > > top stop and wake them up. > > > > When ->release() waits in blk_enter_queue(), the following code block > > > > mutex_lock(&disk->open_mutex); > > __blk_mark_disk_dead(disk); > > xa_for_each_start(&disk->part_tbl, idx, part, 1) > > drop_partition(part); > > mutex_unlock(&disk->open_mutex); > > blk_enter_queue()->schedule() holds ->open_mutex, so that block > of code sleeps on ->open_mutex. We can't drain under ->open_mutex. We don't start to drain yet, then why does blk_enter_queue() sleeps and it waits for what? Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 12:32 ` Ming Lei @ 2024-10-19 12:37 ` Sergey Senozhatsky 2024-10-19 12:50 ` Ming Lei 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 12:37 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 20:32), Ming Lei wrote: [..] > > > When ->release() waits in blk_enter_queue(), the following code block > > > > > > mutex_lock(&disk->open_mutex); > > > __blk_mark_disk_dead(disk); > > > xa_for_each_start(&disk->part_tbl, idx, part, 1) > > > drop_partition(part); > > > mutex_unlock(&disk->open_mutex); > > > > blk_enter_queue()->schedule() holds ->open_mutex, so that block > > of code sleeps on ->open_mutex. We can't drain under ->open_mutex. > > We don't start to drain yet, then why does blk_enter_queue() sleeps and > it waits for what? Unfortunately I don't have a device to repro this, but it happens to a number of our customers (using different peripheral devices, but, as far as I'm concerned, all running 6.6 kernel). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 12:37 ` Sergey Senozhatsky @ 2024-10-19 12:50 ` Ming Lei 2024-10-19 12:58 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Ming Lei @ 2024-10-19 12:50 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: > On (24/10/19 20:32), Ming Lei wrote: > [..] > > > > When ->release() waits in blk_enter_queue(), the following code block > > > > > > > > mutex_lock(&disk->open_mutex); > > > > __blk_mark_disk_dead(disk); > > > > xa_for_each_start(&disk->part_tbl, idx, part, 1) > > > > drop_partition(part); > > > > mutex_unlock(&disk->open_mutex); > > > > > > blk_enter_queue()->schedule() holds ->open_mutex, so that block > > > of code sleeps on ->open_mutex. We can't drain under ->open_mutex. > > > > We don't start to drain yet, then why does blk_enter_queue() sleeps and > > it waits for what? > > Unfortunately I don't have a device to repro this, but it happens to a > number of our customers (using different peripheral devices, but, as far > as I'm concerned, all running 6.6 kernel). I can understand the issue on v6.6 because it doesn't have commit 7e04da2dc701 ("block: fix deadlock between sd_remove & sd_release"). But for the latest upstream, I don't get idea how it can happen. Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 12:50 ` Ming Lei @ 2024-10-19 12:58 ` Sergey Senozhatsky 2024-10-19 13:09 ` Ming Lei 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 12:58 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 20:50), Ming Lei wrote: > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: > > On (24/10/19 20:32), Ming Lei wrote: > > [..] > > Unfortunately I don't have a device to repro this, but it happens to a > > number of our customers (using different peripheral devices, but, as far > > as I'm concerned, all running 6.6 kernel). > > I can understand the issue on v6.6 because it doesn't have commit > 7e04da2dc701 ("block: fix deadlock between sd_remove & sd_release"). We have that one in 6.6, as far as I can tell https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/block/genhd.c?h=v6.6.57#n663 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 12:58 ` Sergey Senozhatsky @ 2024-10-19 13:09 ` Ming Lei 2024-10-19 13:50 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Ming Lei @ 2024-10-19 13:09 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote: > On (24/10/19 20:50), Ming Lei wrote: > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: > > > On (24/10/19 20:32), Ming Lei wrote: > > > [..] > > > Unfortunately I don't have a device to repro this, but it happens to a > > > number of our customers (using different peripheral devices, but, as far > > > as I'm concerned, all running 6.6 kernel). > > > > I can understand the issue on v6.6 because it doesn't have commit > > 7e04da2dc701 ("block: fix deadlock between sd_remove & sd_release"). > > We have that one in 6.6, as far as I can tell > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/block/genhd.c?h=v6.6.57#n663 Then we need to root-cause it first. If you can reproduce it, please provide dmesg log, and deadlock related process stack trace log collected via sysrq control. thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 13:09 ` Ming Lei @ 2024-10-19 13:50 ` Sergey Senozhatsky 2024-10-19 15:03 ` Ming Lei 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 13:50 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 21:09), Ming Lei wrote: > On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote: > > On (24/10/19 20:50), Ming Lei wrote: > > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: [..] > > Then we need to root-cause it first. > > If you can reproduce it I cannot. All I'm having are backtraces from various crash reports, I posted some of them earlier [1] (and in that entire thread). This loos like close()->bio_queue_enter() vs usb_disconnect()->del_gendisk() deadlock, and del_gendisk() cannot drain. Doing drain under the same lock, that things we want to drain currently hold, looks troublesome in general. [1] https://lore.kernel.org/linux-block/20241008051948.GB10794@google.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 13:50 ` Sergey Senozhatsky @ 2024-10-19 15:03 ` Ming Lei 2024-10-19 15:11 ` Sergey Senozhatsky ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Ming Lei @ 2024-10-19 15:03 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Sat, Oct 19, 2024 at 10:50:10PM +0900, Sergey Senozhatsky wrote: > On (24/10/19 21:09), Ming Lei wrote: > > On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote: > > > On (24/10/19 20:50), Ming Lei wrote: > > > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: > [..] > > > > Then we need to root-cause it first. > > > > If you can reproduce it > > I cannot. > > All I'm having are backtraces from various crash reports, I posted > some of them earlier [1] (and in that entire thread). This loos like > close()->bio_queue_enter() vs usb_disconnect()->del_gendisk() deadlock, > and del_gendisk() cannot drain. Doing drain under the same lock, that > things we want to drain currently hold, looks troublesome in general. > > [1] https://lore.kernel.org/linux-block/20241008051948.GB10794@google.com Probably bio_queue_enter() waits for runtime PM, and the queue is in ->pm_only state, and BLK_MQ_REQ_PM isn't passed actually from ioctl_internal_command() <- scsi_set_medium_removal(). And if you have vmcore collected, it shouldn't be not hard to root cause. Also I'd suggest to collect intact related dmesg log in future, instead of providing selective log, such as, there isn't even kernel version... Thanks, Ming ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 15:03 ` Ming Lei @ 2024-10-19 15:11 ` Sergey Senozhatsky 2024-10-19 15:40 ` Sergey Senozhatsky 2024-10-28 5:44 ` Sergey Senozhatsky 2 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 15:11 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 23:03), Ming Lei wrote: > Probably bio_queue_enter() waits for runtime PM, and the queue is in > ->pm_only state, and BLK_MQ_REQ_PM isn't passed actually from > ioctl_internal_command() <- scsi_set_medium_removal(). > > And if you have vmcore collected, it shouldn't be not hard to root cause. We don't collect those. > Also I'd suggest to collect intact related dmesg log in future, instead of > providing selective log, such as, there isn't even kernel version... These "selected" backtraces are the only backtraces in the dmesg. I literally have reports that have just two backtraces of tasks blocked over 120 seconds, one close()->bio_queue_enter()->schedule (under ->open_mutex) and the other one del_gendisk()->mutex_lock()->schedule(). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 15:03 ` Ming Lei 2024-10-19 15:11 ` Sergey Senozhatsky @ 2024-10-19 15:40 ` Sergey Senozhatsky 2024-10-28 5:44 ` Sergey Senozhatsky 2 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-19 15:40 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 23:03), Ming Lei wrote: > > there isn't even kernel version... > Well, that's on me, yes, I admit it. I completely missed that but that was never a secret [1]. I missed it, probably, because I would have not reached out to upstream with 5.4 bug report; and 6.6, in that part of the code, looked quite close to the upsteram. But well, I forgot to add the kernel version, yes. [1] https://lore.kernel.org/linux-block/20241003135504.GL11458@google.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-19 15:03 ` Ming Lei 2024-10-19 15:11 ` Sergey Senozhatsky 2024-10-19 15:40 ` Sergey Senozhatsky @ 2024-10-28 5:44 ` Sergey Senozhatsky 2 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-28 5:44 UTC (permalink / raw) To: Ming Lei Cc: Sergey Senozhatsky, Christoph Hellwig, Jens Axboe, YangYang, linux-block On (24/10/19 23:03), Ming Lei wrote: > On Sat, Oct 19, 2024 at 10:50:10PM +0900, Sergey Senozhatsky wrote: > > On (24/10/19 21:09), Ming Lei wrote: > > > On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote: > > > > On (24/10/19 20:50), Ming Lei wrote: > > > > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote: > > [..] > > Probably bio_queue_enter() waits for runtime PM, and the queue is in > ->pm_only state, and BLK_MQ_REQ_PM isn't passed actually from > ioctl_internal_command() <- scsi_set_medium_removal(). Sorry for the delay. Another report. I see lots of buffer I/O errors <6>[ 364.268167] usb-storage 3-3:1.0: USB Mass Storage device detected <6>[ 364.268551] scsi host3: usb-storage 3-3:1.0 <3>[ 364.274806] Buffer I/O error on dev sdc1, logical block 0, lost async page write <5>[ 365.318424] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4 <5>[ 365.319898] sd 3:0:0:0: [sdc] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB) <5>[ 365.320077] sd 3:0:0:0: [sdc] Write Protect is off <7>[ 365.320085] sd 3:0:0:0: [sdc] Mode Sense: 03 00 00 00 <4>[ 365.320255] sd 3:0:0:0: [sdc] No Caching mode page found <4>[ 365.320262] sd 3:0:0:0: [sdc] Assuming drive cache: write through <6>[ 365.322483] sdc: sdc1 <5>[ 365.323130] sd 3:0:0:0: [sdc] Attached SCSI removable disk <6>[ 369.083225] usb 3-3: USB disconnect, device number 49 Then PM suspend/resume. After resume <7>[ 1338.847937] PM: resume of devices complete after 291.422 msecs <6>[ 1338.854215] OOM killer enabled. <6>[ 1338.854235] Restarting tasks ... <6>[ 1338.854797] mei_hdcp 0000:00:16.0-(UUID: 7): bound 0000:00:02.0 (ops 0xffffffffb8f03e50) <6>[ 1338.857745] mei_pxp 0000:00:16.0-(UUID: 2): bound 0000:00:02.0 (ops 0xffffffffb8f16a80) <4>[ 1338.859663] done. <5>[ 1338.859683] random: crng reseeded on system resumption <12>[ 1338.868200] init: cupsd main process ended, respawning <6>[ 1338.868541] Resume caused by IRQ 9, acpi <6>[ 1338.868549] Resume caused by IRQ 98, chromeos-ec <6>[ 1338.868555] PM: suspend exit lots of buffer I/O errors again and eventually a deadlock. The deadlock happens much later than 120 seconds after resume, so I cannot directly connect those events. [..] <6>[ 1859.660882] usb-storage 3-3:1.0: USB Mass Storage device detected <6>[ 1859.661457] scsi host4: usb-storage 3-3:1.0 <3>[ 1859.668180] Buffer I/O error on dev sdd1, logical block 0, lost async page write <5>[ 1860.697826] scsi 4:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4 <5>[ 1860.699222] sd 4:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB) <5>[ 1860.699373] sd 4:0:0:0: [sdd] Write Protect is off <7>[ 1860.699380] sd 4:0:0:0: [sdd] Mode Sense: 03 00 00 00 <4>[ 1860.699522] sd 4:0:0:0: [sdd] No Caching mode page found <4>[ 1860.699526] sd 4:0:0:0: [sdd] Assuming drive cache: write through <6>[ 1860.701393] sdd: sdd1 <5>[ 1860.701886] sd 4:0:0:0: [sdd] Attached SCSI removable disk <6>[ 1862.077109] usb 3-3: USB disconnect, device number 110 <6>[ 1862.338159] usb 3-3: new high-speed USB device number 111 using xhci_hcd <6>[ 1862.468090] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00 <6>[ 1862.468105] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8) <6>[ 1862.468111] usb 3-3: Product: Disk 2.0 <6>[ 1862.468115] usb 3-3: Manufacturer: USB <6>[ 1862.468119] usb 3-3: SerialNumber: (Serial: 9) <6>[ 1862.469962] usb-storage 3-3:1.0: USB Mass Storage device detected <6>[ 1862.470642] scsi host3: usb-storage 3-3:1.0 <3>[ 1862.476447] Buffer I/O error on dev sdd1, logical block 0, lost async page write <5>[ 1863.514018] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4 <5>[ 1863.515489] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB) <5>[ 1863.515640] sd 3:0:0:0: [sdd] Write Protect is off <7>[ 1863.515646] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00 <4>[ 1863.515797] sd 3:0:0:0: [sdd] No Caching mode page found <4>[ 1863.515802] sd 3:0:0:0: [sdd] Assuming drive cache: write through <6>[ 1863.518227] sdd: sdd1 <5>[ 1863.518551] sd 3:0:0:0: [sdd] Attached SCSI removable disk <6>[ 1865.018356] usb 3-3: USB disconnect, device number 111 <6>[ 1865.285091] usb 3-3: new high-speed USB device number 112 using xhci_hcd <3>[ 1865.605088] usb 3-3: device descriptor read/64, error -71 <6>[ 1865.844873] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00 <6>[ 1865.844892] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8) <6>[ 1865.844898] usb 3-3: Product: Disk 2.0 <6>[ 1865.844903] usb 3-3: Manufacturer: USB <6>[ 1865.844906] usb 3-3: SerialNumber: (Serial: 9) <6>[ 1865.847205] usb-storage 3-3:1.0: USB Mass Storage device detected <6>[ 1865.847806] scsi host4: usb-storage 3-3:1.0 <3>[ 1865.853941] Buffer I/O error on dev sdd1, logical block 0, lost async page write <6>[ 1866.436729] usb 3-3: USB disconnect, device number 112 <6>[ 1866.700998] usb 3-3: new high-speed USB device number 113 using xhci_hcd <6>[ 1866.829449] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00 <6>[ 1866.829466] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8) <6>[ 1866.829473] usb 3-3: Product: Disk 2.0 <6>[ 1866.829478] usb 3-3: Manufacturer: USB <6>[ 1866.829482] usb 3-3: SerialNumber: (Serial: 9) <6>[ 1866.831605] usb-storage 3-3:1.0: USB Mass Storage device detected <6>[ 1866.832173] scsi host3: usb-storage 3-3:1.0 <5>[ 1867.866118] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4 <5>[ 1867.868213] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB) <5>[ 1867.868604] sd 3:0:0:0: [sdd] Write Protect is off <7>[ 1867.868616] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00 <4>[ 1867.869071] sd 3:0:0:0: [sdd] No Caching mode page found <4>[ 1867.869081] sd 3:0:0:0: [sdd] Assuming drive cache: write through <6>[ 1867.871429] sdd: sdd1 <5>[ 1867.871857] sd 3:0:0:0: [sdd] Attached SCSI removable disk <6>[ 1868.423593] usb 3-3: USB disconnect, device number 113 <6>[ 1868.431172] sdd: detected capacity change from 122880000 to 0 <28>[ 1928.670962] udevd[203]: sdd: Worker [9839] processing SEQNUM=6508 is taking a long time <3>[ 2004.633104] INFO: task kworker/0:3:187 blocked for more than 122 seconds. <3>[ 2004.633125] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1 <3>[ 2004.633131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ 2004.633149] task:kworker/0:3 state:D stack:0 pid:187 ppid:2 flags:0x00004000 <6>[ 2004.633149] Workqueue: usb_hub_wq hub_event <6>[ 2004.633166] Call Trace: <6>[ 2004.633172] <TASK> <6>[ 2004.633179] schedule+0x4f4/0x1540 <6>[ 2004.633190] ? default_wake_function+0x388/0xcd0 <6>[ 2004.633200] schedule_preempt_disabled+0x15/0x30 <6>[ 2004.633206] __mutex_lock_slowpath+0x2b5/0x4d0 <6>[ 2004.633212] del_gendisk+0x136/0x370 <6>[ 2004.633222] sd_remove+0x30/0x60 <6>[ 2004.633230] device_release_driver_internal+0x1a2/0x2a0 <6>[ 2004.633239] bus_remove_device+0x154/0x180 <6>[ 2004.633248] device_del+0x207/0x370 <6>[ 2004.633256] ? __pfx_transport_remove_classdev+0x10/0x10 <6>[ 2004.633264] ? attribute_container_device_trigger+0xe3/0x110 <6>[ 2004.633272] __scsi_remove_device+0xc0/0x170 <6>[ 2004.633279] scsi_forget_host+0x45/0x60 <6>[ 2004.633287] scsi_remove_host+0x87/0x170 <6>[ 2004.633295] usb_stor_disconnect+0x63/0xb0 <6>[ 2004.633302] usb_unbind_interface+0xbe/0x250 <6>[ 2004.633309] device_release_driver_internal+0x1a2/0x2a0 <6>[ 2004.633315] bus_remove_device+0x154/0x180 <6>[ 2004.633322] device_del+0x207/0x370 <6>[ 2004.633328] ? kobject_release+0x56/0xb0 <6>[ 2004.633336] usb_disable_device+0x72/0x170 <6>[ 2004.633342] usb_disconnect+0xeb/0x280 <6>[ 2004.633350] hub_event+0xac7/0x1760 <6>[ 2004.633359] worker_thread+0x355/0x900 <6>[ 2004.633367] kthread+0xed/0x110 <6>[ 2004.633374] ? __pfx_worker_thread+0x10/0x10 <6>[ 2004.633381] ? __pfx_kthread+0x10/0x10 <6>[ 2004.633387] ret_from_fork+0x38/0x50 <6>[ 2004.633393] ? __pfx_kthread+0x10/0x10 <6>[ 2004.633399] ret_from_fork_asm+0x1b/0x30 <6>[ 2004.633407] </TASK> <3>[ 2004.633496] INFO: task cros-disks:1614 blocked for more than 122 seconds. <3>[ 2004.633502] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1 <3>[ 2004.633506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ 2004.633519] task:cros-disks state:D stack:0 pid:1614 ppid:1 flags:0x00004002 <6>[ 2004.633519] Call Trace: <6>[ 2004.633523] <TASK> <6>[ 2004.633527] schedule+0x4f4/0x1540 <6>[ 2004.633533] ? xas_store+0xc57/0xcc0 <6>[ 2004.633539] ? lru_add_drain+0x4d8/0x6e0 <6>[ 2004.633548] blk_queue_enter+0x172/0x250 <6>[ 2004.633557] ? __pfx_autoremove_wake_function+0x10/0x10 <6>[ 2004.633565] blk_mq_alloc_request+0x167/0x210 <6>[ 2004.633573] scsi_execute_cmd+0x65/0x240 <6>[ 2004.633580] ioctl_internal_command+0x6c/0x150 <6>[ 2004.633590] scsi_set_medium_removal+0x63/0xc0 <6>[ 2004.633598] sd_release+0x42/0x50 <6>[ 2004.633606] blkdev_put+0x13b/0x1f0 <6>[ 2004.633615] blkdev_release+0x2b/0x40 <6>[ 2004.633623] __fput_sync+0x9b/0x2c0 <6>[ 2004.633632] __se_sys_close+0x69/0xc0 <6>[ 2004.633639] do_syscall_64+0x60/0x90 <6>[ 2004.633649] ? exit_to_user_mode_prepare+0x49/0x130 <6>[ 2004.633657] ? do_syscall_64+0x6f/0x90 <6>[ 2004.633665] ? do_syscall_64+0x6f/0x90 <6>[ 2004.633672] ? do_syscall_64+0x6f/0x90 <6>[ 2004.633680] ? irq_exit_rcu+0x38/0x90 <6>[ 2004.633687] ? exit_to_user_mode_prepare+0x49/0x130 <6>[ 2004.633694] entry_SYSCALL_64_after_hwframe+0x73/0xdd <6>[ 2004.633703] RIP: 0033:0x786d55239960 <6>[ 2004.633711] RSP: 002b:00007ffd1c6d8c28 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 <6>[ 2004.633719] RAX: ffffffffffffffda RBX: 00005a5ffe743fd0 RCX: 0000786d55239960 <6>[ 2004.633725] RDX: 0000786d55307b00 RSI: 0000000000000000 RDI: 000000000000000c <6>[ 2004.633730] RBP: 00007ffd1c6d8d30 R08: 0000000000000007 R09: 00005a5ffe78a9f0 <6>[ 2004.633735] R10: 8a1ecef621fff8a0 R11: 0000000000000202 R12: 0000000000000831 <6>[ 2004.633741] R13: 00005a5ffe743f60 R14: 00005a5ffe743f80 R15: 000000000000000c <6>[ 2004.633746] </TASK> ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk 2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig @ 2024-10-09 11:38 ` Christoph Hellwig 2024-10-16 4:15 ` YangYang 2024-10-16 2:09 ` try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Sergey Senozhatsky 2 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2024-10-09 11:38 UTC (permalink / raw) To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block Now that we stop sd and sr from submitting passthrough commands from their ->release methods we can and should start the drain before taking ->open_mutex, so that we can entirely prevent this kind of deadlock by ensuring that the disk is clearly marked dead before open_mutex is taken in del_gendisk. This includes a revert of commit 7e04da2dc701 ("block: fix deadlock between sd_remove & sd_release"), which was a partial fix for a similar deadlock. Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- block/genhd.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 7026569fa8a0be..c15e8f1163664b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -655,16 +655,6 @@ void del_gendisk(struct gendisk *disk) if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN))) return; - disk_del_events(disk); - - /* - * Prevent new openers by unlinked the bdev inode. - */ - mutex_lock(&disk->open_mutex); - xa_for_each(&disk->part_tbl, idx, part) - bdev_unhash(part); - mutex_unlock(&disk->open_mutex); - /* * Tell the file system to write back all dirty data and shut down if * it hasn't been notified earlier. @@ -673,10 +663,22 @@ void del_gendisk(struct gendisk *disk) blk_report_disk_dead(disk, false); /* - * Drop all partitions now that the disk is marked dead. + * Then mark the disk dead to stop new requests from being served ASAP. + * This needs to happen before taking ->open_mutex to prevent deadlocks + * with SCSI ULPs that send passthrough commands from their ->release + * methods. */ - mutex_lock(&disk->open_mutex); __blk_mark_disk_dead(disk); + + disk_del_events(disk); + + /* + * Prevent new openers by unlinking the bdev inode, and drop all + * partitions. + */ + mutex_lock(&disk->open_mutex); + xa_for_each(&disk->part_tbl, idx, part) + bdev_unhash(part); xa_for_each_start(&disk->part_tbl, idx, part, 1) drop_partition(part); mutex_unlock(&disk->open_mutex); -- 2.45.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk 2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig @ 2024-10-16 4:15 ` YangYang 0 siblings, 0 replies; 28+ messages in thread From: YangYang @ 2024-10-16 4:15 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe; +Cc: Sergey Senozhatsky, linux-block On 2024/10/9 19:38, Christoph Hellwig wrote: > Now that we stop sd and sr from submitting passthrough commands from > their ->release methods we can and should start the drain before taking > ->open_mutex, so that we can entirely prevent this kind of deadlock by > ensuring that the disk is clearly marked dead before open_mutex is > taken in del_gendisk. > > This includes a revert of commit 7e04da2dc701 ("block: fix deadlock > between sd_remove & sd_release"), which was a partial fix for a similar > deadlock. > > Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > block/genhd.c | 26 ++++++++++++++------------ > 1 file changed, 14 insertions(+), 12 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 7026569fa8a0be..c15e8f1163664b 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -655,16 +655,6 @@ void del_gendisk(struct gendisk *disk) > if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN))) > return; > > - disk_del_events(disk); > - > - /* > - * Prevent new openers by unlinked the bdev inode. > - */ > - mutex_lock(&disk->open_mutex); > - xa_for_each(&disk->part_tbl, idx, part) > - bdev_unhash(part); > - mutex_unlock(&disk->open_mutex); > - > /* > * Tell the file system to write back all dirty data and shut down if > * it hasn't been notified earlier. > @@ -673,10 +663,22 @@ void del_gendisk(struct gendisk *disk) > blk_report_disk_dead(disk, false); > > /* > - * Drop all partitions now that the disk is marked dead. > + * Then mark the disk dead to stop new requests from being served ASAP. > + * This needs to happen before taking ->open_mutex to prevent deadlocks > + * with SCSI ULPs that send passthrough commands from their ->release > + * methods. > */ > - mutex_lock(&disk->open_mutex); > __blk_mark_disk_dead(disk); > + > + disk_del_events(disk); > + > + /* > + * Prevent new openers by unlinking the bdev inode, and drop all > + * partitions. > + */ > + mutex_lock(&disk->open_mutex); > + xa_for_each(&disk->part_tbl, idx, part) > + bdev_unhash(part); > xa_for_each_start(&disk->part_tbl, idx, part, 1) > drop_partition(part); > mutex_unlock(&disk->open_mutex); Looks good. Feel free to add: Reviewed-by: Yang Yang <yang.yang@vivo.com> Thanks. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: try to avoid del_gendisk vs passthrough from ->release deadlocks v2 2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig @ 2024-10-16 2:09 ` Sergey Senozhatsky 2 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-16 2:09 UTC (permalink / raw) To: YangYang, Jens Axboe; +Cc: Sergey Senozhatsky, linux-block, Christoph Hellwig On (24/10/09 13:38), Christoph Hellwig wrote: > Hi all, > > this is my attempted fix for the problem reported by Sergey in the > "block: del_gendisk() vs blk_queue_enter() race condition" thread. As > I don't have a reproducer this is all just best guest so far, so handle > it with care! Hi YangYang, Jens, Are you OK with the series? ^ permalink raw reply [flat|nested] 28+ messages in thread
* RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks @ 2024-10-08 11:57 Christoph Hellwig 2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 0 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2024-10-08 11:57 UTC (permalink / raw) To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block Hi all, this is my attempted fix for the problem reported by Sergey in the "block: del_gendisk() vs blk_queue_enter() race condition" thread. As I don't have a reproducer this is all just best guest so far, so handle it with care! Diffstat block/genhd.c | 40 ++++++++++++++++++++++++++-------------- include/linux/blkdev.h | 1 + 2 files changed, 27 insertions(+), 14 deletions(-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig @ 2024-10-08 11:57 ` Christoph Hellwig 2024-10-09 5:06 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2024-10-08 11:57 UTC (permalink / raw) To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block When del_gendisk shuts down access to a gendisk, it could lead to a deadlock with sd or, which try to submit passthrough SCSI commands from their ->release method under open_mutex. The submission can be blocked in blk_enter_queue while del_gendisk can't get to actually telling them top stop and wake them up. As the disk is going away there is no real point in sending these commands, but we have no really good way to distinguish between the cases. For now mark even standalone (aka SCSI queues) as dying in del_gendisk to avoid this deadlock, but the real fix will be to split freeing a disk from freezing a queue for not disk associated requests. Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/genhd.c | 14 ++++++++++++-- include/linux/blkdev.h | 1 + 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 1c05dd4c6980b5..ac1c496ad43343 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk) if (test_and_set_bit(GD_DEAD, &disk->state)) return; - if (test_bit(GD_OWNS_QUEUE, &disk->state)) - blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); + /* + * Also mark the disk dead if it is not owned by the gendisk. This + * means we can't allow /dev/sg passthrough or SCSI internal commands + * while unbinding a ULP. That is more than just a bit ugly, but until + * we untangle q_usage_counter into one owned by the disk and one owned + * by the queue this is as good as it gets. The flag will be cleared + * at the end of del_gendisk if it wasn't set before. + */ + if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags)) + set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags); /* * Stop buffered writers from dirtying pages that can't be written out. @@ -719,6 +727,8 @@ void del_gendisk(struct gendisk *disk) * again. Else leave the queue frozen to fail all I/O. */ if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); __blk_mq_unfreeze_queue(q, true); } else { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50c3b959da2816..391e3eb3bb5e61 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -590,6 +590,7 @@ struct request_queue { /* Keep blk_queue_flag_name[] in sync with the definitions below */ enum { QUEUE_FLAG_DYING, /* queue being torn down */ + QUEUE_FLAG_RESURRECT, /* temporarily dying */ QUEUE_FLAG_NOMERGES, /* disable merge attempts */ QUEUE_FLAG_SAME_COMP, /* complete on same CPU-group */ QUEUE_FLAG_FAIL_IO, /* fake timeout */ -- 2.45.2 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig @ 2024-10-09 5:06 ` Sergey Senozhatsky 2024-10-09 7:34 ` Christoph Hellwig 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2024-10-09 5:06 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block On (24/10/08 13:57), Christoph Hellwig wrote: > When del_gendisk shuts down access to a gendisk, it could lead to a > deadlock with sd or, which try to submit passthrough SCSI commands from > their ->release method under open_mutex. The submission can be blocked > in blk_enter_queue while del_gendisk can't get to actually telling them > top stop and wake them up. > > As the disk is going away there is no real point in sending these > commands, but we have no really good way to distinguish between the > cases. For now mark even standalone (aka SCSI queues) as dying in > del_gendisk to avoid this deadlock, but the real fix will be to split > freeing a disk from freezing a queue for not disk associated requests. > > Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> > Signed-off-by: Christoph Hellwig <hch@lst.de> FWIW Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> [..] > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); Don't know if we also want to clear QUEUE_FLAG_RESURRECT here, just in case. > blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); > __blk_mq_unfreeze_queue(q, true); > } else { ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead 2024-10-09 5:06 ` Sergey Senozhatsky @ 2024-10-09 7:34 ` Christoph Hellwig 0 siblings, 0 replies; 28+ messages in thread From: Christoph Hellwig @ 2024-10-09 7:34 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block On Wed, Oct 09, 2024 at 02:06:02PM +0900, Sergey Senozhatsky wrote: > > if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { > > + if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags)) > > + clear_bit(QUEUE_FLAG_DYING, &q->queue_flags); > > Don't know if we also want to clear QUEUE_FLAG_RESURRECT here, just in > case. Yes, we really should do that. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-10-28 5:44 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig 2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 2024-10-09 12:31 ` Sergey Senozhatsky 2024-10-09 12:41 ` Christoph Hellwig 2024-10-09 12:43 ` Sergey Senozhatsky 2024-10-09 13:49 ` Jens Axboe 2024-10-16 4:14 ` YangYang 2024-10-16 11:09 ` Ming Lei 2024-10-16 12:32 ` Christoph Hellwig 2024-10-16 12:49 ` Ming Lei 2024-10-16 13:35 ` Ming Lei 2024-10-19 1:25 ` Sergey Senozhatsky 2024-10-19 12:32 ` Ming Lei 2024-10-19 12:37 ` Sergey Senozhatsky 2024-10-19 12:50 ` Ming Lei 2024-10-19 12:58 ` Sergey Senozhatsky 2024-10-19 13:09 ` Ming Lei 2024-10-19 13:50 ` Sergey Senozhatsky 2024-10-19 15:03 ` Ming Lei 2024-10-19 15:11 ` Sergey Senozhatsky 2024-10-19 15:40 ` Sergey Senozhatsky 2024-10-28 5:44 ` Sergey Senozhatsky 2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig 2024-10-16 4:15 ` YangYang 2024-10-16 2:09 ` try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Sergey Senozhatsky -- strict thread matches above, loose matches on Subject: below -- 2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig 2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig 2024-10-09 5:06 ` Sergey Senozhatsky 2024-10-09 7:34 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).