RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks
@ 2024-10-08 11:57 Christoph Hellwig
  2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Christoph Hellwig @ 2024-10-08 11:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block

Hi all,

this is my attempted fix for the problem reported by Sergey in the
"block: del_gendisk() vs blk_queue_enter() race condition" thread.  As
I don't have a reproducer this is all just best guest so far, so handle
it with care!


Diffstat
 block/genhd.c          |   40 ++++++++++++++++++++++++++--------------
 include/linux/blkdev.h |    1 +
 2 files changed, 27 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead
  2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
@ 2024-10-08 11:57 ` Christoph Hellwig
  2024-10-09  5:06   ` Sergey Senozhatsky
  2024-10-08 11:57 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
  2024-10-08 15:23 ` RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Sergey Senozhatsky
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2024-10-08 11:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block

When del_gendisk shuts down access to a gendisk, it could lead to a
deadlock with sd or, which try to submit passthrough SCSI commands from
their ->release method under open_mutex.  The submission can be blocked
in blk_enter_queue while del_gendisk can't get to actually telling them
top stop and wake them up.

As the disk is going away there is no real point in sending these
commands, but we have no really good way to distinguish between the
cases.  For now mark even standalone (aka SCSI queues) as dying in
del_gendisk to avoid this deadlock, but the real fix will be to split
freeing a disk from freezing a queue for not disk associated requests.

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c          | 14 ++++++++++++--
 include/linux/blkdev.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 1c05dd4c6980b5..ac1c496ad43343 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -589,8 +589,16 @@ static void __blk_mark_disk_dead(struct gendisk *disk)
 	if (test_and_set_bit(GD_DEAD, &disk->state))
 		return;
 
-	if (test_bit(GD_OWNS_QUEUE, &disk->state))
-		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
+	/*
+	 * Also mark the disk dead if it is not owned by the gendisk.  This
+	 * means we can't allow /dev/sg passthrough or SCSI internal commands
+	 * while unbinding a ULP.  That is more than just a bit ugly, but until
+	 * we untangle q_usage_counter into one owned by the disk and one owned
+	 * by the queue this is as good as it gets.  The flag will be cleared
+	 * at the end of del_gendisk if it wasn't set before.
+	 */
+	if (!test_and_set_bit(QUEUE_FLAG_DYING, &disk->queue->queue_flags))
+		set_bit(QUEUE_FLAG_RESURRECT, &disk->queue->queue_flags);
 
 	/*
 	 * Stop buffered writers from dirtying pages that can't be written out.
@@ -719,6 +727,8 @@ void del_gendisk(struct gendisk *disk)
 	 * again.  Else leave the queue frozen to fail all I/O.
 	 */
 	if (!test_bit(GD_OWNS_QUEUE, &disk->state)) {
+		if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags))
+			clear_bit(QUEUE_FLAG_DYING, &q->queue_flags);
 		blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
 		__blk_mq_unfreeze_queue(q, true);
 	} else {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50c3b959da2816..391e3eb3bb5e61 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -590,6 +590,7 @@ struct request_queue {
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
 enum {
 	QUEUE_FLAG_DYING,		/* queue being torn down */
+	QUEUE_FLAG_RESURRECT,		/* temporarily dying */
 	QUEUE_FLAG_NOMERGES,		/* disable merge attempts */
 	QUEUE_FLAG_SAME_COMP,		/* complete on same CPU-group */
 	QUEUE_FLAG_FAIL_IO,		/* fake timeout */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead
  2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
@ 2024-10-09  5:06   ` Sergey Senozhatsky
  2024-10-09  7:34     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Sergey Senozhatsky @ 2024-10-09  5:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block

On (24/10/08 13:57), Christoph Hellwig wrote:
> When del_gendisk shuts down access to a gendisk, it could lead to a
> deadlock with sd or, which try to submit passthrough SCSI commands from
> their ->release method under open_mutex.  The submission can be blocked
> in blk_enter_queue while del_gendisk can't get to actually telling them
> top stop and wake them up.
> 
> As the disk is going away there is no real point in sending these
> commands, but we have no really good way to distinguish between the
> cases.  For now mark even standalone (aka SCSI queues) as dying in
> del_gendisk to avoid this deadlock, but the real fix will be to split
> freeing a disk from freezing a queue for not disk associated requests.
> 
> Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

FWIW
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

[..]

>  	if (!test_bit(GD_OWNS_QUEUE, &disk->state)) {
> +		if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags))
> +			clear_bit(QUEUE_FLAG_DYING, &q->queue_flags);

Don't know if we also want to clear QUEUE_FLAG_RESURRECT here, just in
case.

>  		blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
>  		__blk_mq_unfreeze_queue(q, true);
>  	} else {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead
  2024-10-09  5:06   ` Sergey Senozhatsky
@ 2024-10-09  7:34     ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2024-10-09  7:34 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Christoph Hellwig, Jens Axboe, YangYang, linux-block

On Wed, Oct 09, 2024 at 02:06:02PM +0900, Sergey Senozhatsky wrote:
> >  	if (!test_bit(GD_OWNS_QUEUE, &disk->state)) {
> > +		if (test_bit(QUEUE_FLAG_RESURRECT, &q->queue_flags))
> > +			clear_bit(QUEUE_FLAG_DYING, &q->queue_flags);
> 
> Don't know if we also want to clear QUEUE_FLAG_RESURRECT here, just in
> case.

Yes, we really should do that.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk
  2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
  2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
@ 2024-10-08 11:57 ` Christoph Hellwig
  2024-10-09  5:04   ` Sergey Senozhatsky
  2024-10-08 15:23 ` RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Sergey Senozhatsky
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2024-10-08 11:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sergey Senozhatsky, YangYang, linux-block

Now that we stop sd and sr from submitting passthrough commands from
their ->release methods we can and should start the drain before taking
->open_mutex, so that we can entirely prevent this kind of deadlock by
ensuring that the disk is clearly marked dead before open_mutex is
taken in del_gendisk.

This includes a revert of commit 7e04da2dc701 ("block: fix deadlock
between sd_remove & sd_release"), which was a partial fix for a similar
deadlock.

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index ac1c496ad43343..98c8d13171c642 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -655,16 +655,6 @@ void del_gendisk(struct gendisk *disk)
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
 		return;
 
-	disk_del_events(disk);
-
-	/*
-	 * Prevent new openers by unlinked the bdev inode.
-	 */
-	mutex_lock(&disk->open_mutex);
-	xa_for_each(&disk->part_tbl, idx, part)
-		bdev_unhash(part);
-	mutex_unlock(&disk->open_mutex);
-
 	/*
 	 * Tell the file system to write back all dirty data and shut down if
 	 * it hasn't been notified earlier.
@@ -673,10 +663,22 @@ void del_gendisk(struct gendisk *disk)
 		blk_report_disk_dead(disk, false);
 
 	/*
-	 * Drop all partitions now that the disk is marked dead.
+	 * Then mark the disk dead to stop new requests from being served ASAP.
+	 * This needs to happen before taking ->open_mutex to prevent deadlocks
+	 * with SCSI ULPs that send passthrough commands from their ->release
+	 * methods.
 	 */
-	mutex_lock(&disk->open_mutex);
 	__blk_mark_disk_dead(disk);
+
+	disk_del_events(disk);
+
+	/*
+	 * Prevent new openers by unlinking the bdev inode, and drop all
+	 * partitions.
+	 */
+	mutex_lock(&disk->open_mutex);
+	xa_for_each(&disk->part_tbl, idx, part)
+		bdev_unhash(part);
 	xa_for_each_start(&disk->part_tbl, idx, part, 1)
 		drop_partition(part);
 	mutex_unlock(&disk->open_mutex);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk
  2024-10-08 11:57 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
@ 2024-10-09  5:04   ` Sergey Senozhatsky
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Senozhatsky @ 2024-10-09  5:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block

On (24/10/08 13:57), Christoph Hellwig wrote:
> Now that we stop sd and sr from submitting passthrough commands from
> their ->release methods we can and should start the drain before taking
> ->open_mutex, so that we can entirely prevent this kind of deadlock by
> ensuring that the disk is clearly marked dead before open_mutex is
> taken in del_gendisk.
> 
> This includes a revert of commit 7e04da2dc701 ("block: fix deadlock
> between sd_remove & sd_release"), which was a partial fix for a similar
> deadlock.
> 
> Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

FWIW
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks
  2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
  2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
  2024-10-08 11:57 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
@ 2024-10-08 15:23 ` Sergey Senozhatsky
  2 siblings, 0 replies; 7+ messages in thread
From: Sergey Senozhatsky @ 2024-10-08 15:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sergey Senozhatsky, YangYang, linux-block

On (24/10/08 13:57), Christoph Hellwig wrote:
> Hi all,
> 
> this is my attempted fix for the problem reported by Sergey in the
> "block: del_gendisk() vs blk_queue_enter() race condition" thread.  As
> I don't have a reproducer this is all just best guest so far, so handle
> it with care!

This looks promising.  Thanks for working on this.
Let me try to test it (albeit without a reproducer).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-09  7:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
2024-10-09  5:06   ` Sergey Senozhatsky
2024-10-09  7:34     ` Christoph Hellwig
2024-10-08 11:57 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
2024-10-09  5:04   ` Sergey Senozhatsky
2024-10-08 15:23 ` RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).