All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Yang Yang <yang.yang@vivo.com>
Subject: Re: block: del_gendisk() vs blk_queue_enter() race condition
Date: Fri, 4 Oct 2024 16:48:18 +0900	[thread overview]
Message-ID: <20241004074818.GP11458@google.com> (raw)
In-Reply-To: <Zv-O9tldIzPfD8ju@infradead.org>

On (24/10/03 23:45), Christoph Hellwig wrote:
> Date: Thu, 3 Oct 2024 23:45:10 -0700
> From: Christoph Hellwig <hch@infradead.org>
> To: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>,
>  linux-block@vger.kernel.org, Yang Yang <yang.yang@vivo.com>
> Subject: Re: block: del_gendisk() vs blk_queue_enter() race condition
> Message-ID: <Zv-O9tldIzPfD8ju@infradead.org>
> 
> On Fri, Oct 04, 2024 at 01:21:27PM +0900, Sergey Senozhatsky wrote:
> > Dunno. Is something like this completely silly?
> 
> __blk_mark_disk_dead got moved into the lock by: 7e04da2dc701 
> ("block: fix deadlock between sd_remove & sd_release"), which has a trace
> that looks very similar to the one your reported.

Hmm, okay, a deadlock one way or another.

> And that commit also points out something I missed - we do not set
> QUEUE_FLAG_DYING here because the gendisk does not own the queue for
> SCSI.  Because of that allocating the request in sd/sr will not fail, and
> it will deadlock.

I see.  Thanks for the pointers.

> So I think the short term fix is to also fail passthrough request here -
> either by clearing and resurrecting QUEUE_FLAG_DYING or by also checking
> q->disk for GD_DEAD if it exists.  Both of these are a bit ugly because
> they will fail passthrough through /dev/sg during the removal which is
> unexpected (although probably not happening for usual workloads).

You are way ahead of me.  Does the below diff look like "checking for
GD_DEAD"?

---

diff --git a/block/blk-core.c b/block/blk-core.c
index bc5e8c5eaac9..ccd36cb5ada7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -292,6 +292,16 @@ void blk_queue_start_drain(struct request_queue *q)
 	wake_up_all(&q->mq_freeze_wq);
 }
 
+void blk_queue_disk_dead(struct request_queue *q)
+{
+	struct gendisk *disk = q->disk;
+
+	if (WARN_ON_ONCE(!test_bit(GD_DEAD, &disk->state)))
+		return;
+	/* Make blk_queue_enter() reexamine the GD_DEAD flag. */
+	wake_up_all(&q->mq_freeze_wq);
+}
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -302,6 +312,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 	const bool pm = flags & BLK_MQ_REQ_PM;
 
 	while (!blk_try_enter_queue(q, pm)) {
+		struct gendisk *disk = q->disk;
+
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EAGAIN;
 
@@ -316,8 +328,9 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(pm, q)) ||
-			   blk_queue_dying(q));
-		if (blk_queue_dying(q))
+			   blk_queue_dying(q) ||
+			   test_bit(GD_DEAD, &disk->state));
+		if (blk_queue_dying(q) || test_bit(GD_DEAD, &disk->state))
 			return -ENODEV;
 	}
 
diff --git a/block/genhd.c b/block/genhd.c
index 1c05dd4c6980..c213a0cf8268 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -583,12 +583,6 @@ static void blk_report_disk_dead(struct gendisk *disk, bool surprise)
 
 static void __blk_mark_disk_dead(struct gendisk *disk)
 {
-	/*
-	 * Fail any new I/O.
-	 */
-	if (test_and_set_bit(GD_DEAD, &disk->state))
-		return;
-
 	if (test_bit(GD_OWNS_QUEUE, &disk->state))
 		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
@@ -649,6 +643,12 @@ void del_gendisk(struct gendisk *disk)
 
 	disk_del_events(disk);
 
+	/*
+	 * Fail any new I/O.
+	 */
+	test_bit(GD_DEAD, &disk->state);
+	blk_queue_disk_dead(disk->queue);
+
 	/*
 	 * Prevent new openers by unlinked the bdev inode.
 	 */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50c3b959da28..aaaa6fa12328 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -862,6 +862,7 @@ extern int blk_lld_busy(struct request_queue *q);
 extern int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_sync_queue(struct request_queue *q);
+void blk_queue_disk_dead(struct request_queue *q);
 
 /* Helper to convert REQ_OP_XXX to its string format XXX */
 extern const char *blk_op_str(enum req_op op);

---

> The proper fix would be to split the freezing mechanism for file system
> vs passthrough I/O, but that's going to be a huge change.

My preference would be a simpler short-term fix (cherry-pick to older
kernels are much easier this way).

  reply	other threads:[~2024-10-04  7:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03  8:56 block: del_gendisk() vs blk_queue_enter() race condition Sergey Senozhatsky
2024-10-03 13:36 ` Christoph Hellwig
2024-10-03 13:43   ` Christoph Hellwig
2024-10-03 14:00     ` Sergey Senozhatsky
2024-10-03 14:17       ` Sergey Senozhatsky
2024-10-04  4:21         ` Sergey Senozhatsky
2024-10-04  6:45           ` Christoph Hellwig
2024-10-04  7:48             ` Sergey Senozhatsky [this message]
2024-10-04  7:49               ` Sergey Senozhatsky
2024-10-04 12:20               ` Christoph Hellwig
2024-10-04 14:32                 ` Sergey Senozhatsky
2024-10-07  6:10                   ` Christoph Hellwig
2024-10-07  9:45                     ` Sergey Senozhatsky
2024-10-08  5:31                       ` Sergey Senozhatsky
2024-10-04 14:41                 ` Sergey Senozhatsky
2024-10-03 13:55   ` Sergey Senozhatsky
2024-10-08  4:02 ` YangYang
2024-10-08  5:19   ` Sergey Senozhatsky
2024-10-08  5:26     ` Sergey Senozhatsky
2024-10-08  5:56       ` Christoph Hellwig
2024-10-08  6:04         ` Christoph Hellwig
2024-10-08  6:10         ` Sergey Senozhatsky
2024-10-08  8:13           ` Christoph Hellwig
2024-10-08  8:20             ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241004074818.GP11458@google.com \
    --to=senozhatsky@chromium.org \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=yang.yang@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.